Liner Yang


2020

pdf bib
面向汉语作为第二语言学习的个性化语法纠错(Personalizing Grammatical Error Correction for Chinese as a Second Language)
Shengsheng Zhang (张生盛) | Guina Pang (庞桂娜) | Liner Yang (杨麟儿) | Chencheng Wang (王辰成) | Yongping Du (杜永萍) | Erhong Yang (杨尔弘) | Yaping Huang (黄雅平)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

语法纠错任务旨在通过自然语言处理技术自动检测并纠正文本中的语序、拼写等语法错误。当前许多针对汉语的语法纠错方法已取得较好的效果,但往往忽略了学习者的个性化特征,如二语等级、母语背景等。因此,本文面向汉语作为第二语言的学习者,提出个性化语法纠错,对不同特征的学习者所犯的错误分别进行纠正,并构建了不同领域汉语学习者的数据集进行实验。实验结果表明,将语法纠错模型适应到学习者的各个领域后,性能得到明显提升。

pdf bib
基于BERT与柱搜索的中文释义生成(Chinese Definition Modeling Based on BERT and Beam Seach)
Qinan Fan (范齐楠) | Cunliang Kong (孔存良) | Liner Yang (杨麟儿) | Erhong Yang (杨尔弘)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

释义生成任务是指为一个目标词生成相应的释义。前人研究中文释义生成任务时未考虑目标词的上下文,本文首次在中文释义生成任务中使用了目标词的上下文信息,并提出了一个基于BERT与柱搜索的释义生成模型。本文构建了包含上下文的CWN中文数据集用于开展实验,除了BLEU指标之外,还使用语义相似度作为额外的自动评价指标,实验结果显示本文模型在中文CWN数据集和英文Oxford数据集上均有显著提升,人工评价结果也与自动评价结果一致。最后,本文对生成实例进行了深入分析。

pdf bib
汉语学习者依存句法树库构建(Construction of a Treebank of Learner Chinese)
Jialu Shi (师佳璐) | Xinyu Luo (罗昕宇) | Liner Yang (杨麟儿) | Dan Xiao (肖丹) | Zhengsheng Hu (胡正声) | Yijun Wang (王一君) | Jiaxin Yuan (袁佳欣) | Yu Jingsi (余婧思) | Erhong Yang (杨尔弘)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

汉语学习者依存句法树库为非母语者语料提供依存句法分析,可以支持第二语言教学与研究,也对面向第二语言的句法分析、语法改错等相关研究具有重要意义。然而,现有的汉语学习者依存句法树库数量较少,且在标注方面仍存在一些问题。为此,本文改进依存句法标注规范,搭建在线标注平台,并开展汉语学习者依存句法标注。本文重点介绍了数据选取、标注流程等问题,并对标注结果进行质量分析,探索二语偏误对标注质量与句法分析的影响。

2019

pdf bib
The BLCU System in the BEA 2019 Shared Task
Liner Yang | Chencheng Wang
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

This paper describes the BLCU Group submissions to the Building Educational Applications (BEA) 2019 Shared Task on Grammatical Error Correction (GEC). The task is to detect and correct grammatical errors that occurred in essays. We participate in 2 tracks including the Restricted Track and the Unrestricted Track. Our system is based on a Transformer model architecture. We integrate many effective methods proposed in recent years. Such as, Byte Pair Encoding, model ensemble, checkpoints average and spell checker. We also corrupt the public monolingual data to further improve the performance of the model. On the test data of the BEA 2019 Shared Task, our system yields F0.5 = 58.62 and 59.50, ranking twelfth and fourth respectively.

2016

pdf bib
A Novel Fast Framework for Topic Labeling Based on Similarity-preserved Hashing
Xian-Ling Mao | Yi-Jing Hao | Qiang Zhou | Wen-Qing Yuan | Liner Yang | Heyan Huang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Recently, topic modeling has been widely applied in data mining due to its powerful ability. A common, major challenge in applying such topic models to other tasks is to accurately interpret the meaning of each topic. Topic labeling, as a major interpreting method, has attracted significant attention recently. However, most of previous works only focus on the effectiveness of topic labeling, and less attention has been paid to quickly creating good topic descriptors; meanwhile, it’s hard to assign labels for new emerging topics by using most of existing methods. To solve the problems above, in this paper, we propose a novel fast topic labeling framework that casts the labeling problem as a k-nearest neighbor (KNN) search problem in a probability vector set. Our experimental results show that the proposed sequential interleaving method based on locality sensitive hashing (LSH) technology is efficient in boosting the comparison speed among probability distributions, and the proposed framework can generate meaningful labels to interpret topics, including new emerging topics.