Gaoqi Rao


2020

pdf bib
中文问句的形式分类和资源建设(Formal classification and resource construction of Chinese questions)
Jiangtao Li (黎江涛) | Gaoqi Rao (饶高琦)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

本文归纳了问句形式在问句语料筛选中的作用,探索了问句分类必需的形式特征,同时通过人工标注建设了中文问句分类语料库,并在此基础上进行了基于规则和统计的分类实验,通过多轮实验迭代优化特征组合形成特征规则集,为当前问答提供形式上的分类基础。实验中,基于优化特征规则集的有限状态自动机可实现宏平均F1值为0.94;统计机器学习中随机森林模型的分类效果较好,F1值宏平均达到0.98,表明问句形式分类具有相当可行性和准确性。

pdf bib
基于组块分析的汉语块依存语法(Chinese Chunk-Based Dependency Grammar)
Qingqing Qian (钱青青) | Chengwen Wang (王诚文) | Gaoqi Rao (饶高琦) | Endong Xun (荀恩东)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

基于词单元的经典依存语法在面向中文的句子分析中遇到诸多汉语特性引起的困难。为此,本文提出汉语的块依存语法,以谓词为核心,以组块为研究对象,在句内和句间寻找谓词所支配的组块,构建句群级别的句法分析框架。这一操作不仅仅是提升叶子节点的语言单位,而且还针对汉语语义特点进行了分析方式和分析规则上的创新,能够较好地解决微观层次的逻辑结构知识,并为中观论元知识和宏观篇章知识打好铺垫。本文主要介绍了块依存语法理念、表示、分析方法及特点,并简要介绍了块依存树库的构建情况。截至目前为止,树库规模为187万字符(超过4万复句、10万小句),其中包含67%新闻文本和32%百科文本。

2018

pdf bib
Overview of NLPTEA-2018 Share Task Chinese Grammatical Error Diagnosis
Gaoqi Rao | Qi Gong | Baolin Zhang | Endong Xun
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

This paper presents the NLPTEA 2018 shared task for Chinese Grammatical Error Diagnosis (CGED) which seeks to identify grammatical error types, their range of occurrence and recommended corrections within sentences written by learners of Chinese as foreign language. We describe the task definition, data preparation, performance metrics, and evaluation results. Of the 20 teams registered for this shared task, 13 teams developed the system and submitted a total of 32 runs. Progress in system performances was obviously, reaching F1 of 36.12% in position level and 25.27% in correction level. All data sets with gold standards and scoring scripts are made publicly available to researchers.

2017

pdf bib
IJCNLP-2017 Task 1: Chinese Grammatical Error Diagnosis
Gaoqi Rao | Baolin Zhang | Endong Xun | Lung-Hao Lee
Proceedings of the IJCNLP 2017, Shared Tasks

This paper presents the IJCNLP 2017 shared task for Chinese grammatical error diagnosis (CGED) which seeks to identify grammatical error types and their range of occurrence within sentences written by learners of Chinese as foreign language. We describe the task definition, data preparation, performance metrics, and evaluation results. Of the 13 teams registered for this shared task, 5 teams developed the system and submitted a total of 13 runs. We expected this evaluation campaign could lead to the development of more advanced NLP techniques for educational applications, especially for Chinese error detection. All data sets with gold standards and scoring scripts are made publicly available to researchers.

2016

pdf bib
Overview of NLP-TEA 2016 Shared Task for Chinese Grammatical Error Diagnosis
Lung-Hao Lee | Gaoqi Rao | Liang-Chih Yu | Endong Xun | Baolin Zhang | Li-Ping Chang
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)

This paper presents the NLP-TEA 2016 shared task for Chinese grammatical error diagnosis which seeks to identify grammatical error types and their range of occurrence within sentences written by learners of Chinese as foreign language. We describe the task definition, data preparation, performance metrics, and evaluation results. Of the 15 teams registered for this shared task, 9 teams developed the system and submitted a total of 36 runs. We expected this evaluation campaign could lead to the development of more advanced NLP techniques for educational applications, especially for Chinese error detection. All data sets with gold standards and scoring scripts are made publicly available to researchers.