Dingmin Wang


2019

pdf bib
Confusionset-guided Pointer Networks for Chinese Spelling Check
Dingmin Wang | Yi Tay | Li Zhong
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

This paper proposes Confusionset-guided Pointer Networks for Chinese Spell Check (CSC) task. More concretely, our approach utilizes the off-the-shelf confusionset for guiding the character generation. To this end, our novel Seq2Seq model jointly learns to copy a correct character from an input sentence through a pointer network, or generate a character from the confusionset rather than the entire vocabulary. We conduct experiments on three human-annotated datasets, and results demonstrate that our proposed generative model outperforms all competitor models by a large margin of up to 20% F1 score, achieving state-of-the-art performance on three datasets.

pdf bib
Bridging the Gap: Improve Part-of-speech Tagging for Chinese Social Media Texts with Foreign Words
Dingmin Wang | Meng Fang | Yan Song | Juntao Li
Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5)

2018

pdf bib
A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check
Dingmin Wang | Yan Song | Jing Li | Jialong Han | Haisong Zhang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Chinese spelling check (CSC) is a challenging yet meaningful task, which not only serves as a preprocessing in many natural language processing(NLP) applications, but also facilitates reading and understanding of running texts in peoples’ daily lives. However, to utilize data-driven approaches for CSC, there is one major limitation that annotated corpora are not enough in applying algorithms and building models. In this paper, we propose a novel approach of constructing CSC corpus with automatically generated spelling errors, which are either visually or phonologically resembled characters, corresponding to the OCR- and ASR-based methods, respectively. Upon the constructed corpus, different models are trained and evaluated for CSC with respect to three standard test sets. Experimental results demonstrate the effectiveness of the corpus, therefore confirm the validity of our approach.

2017

pdf bib
NLPTEA 2017 Shared Task – Chinese Spelling Check
Gabriel Fung | Maxime Debosschere | Dingmin Wang | Bo Li | Jia Zhu | Kam-Fai Wong
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)

This paper provides an overview along with our findings of the Chinese Spelling Check shared task at NLPTEA 2017. The goal of this task is to develop a computer-assisted system to automatically diagnose typing errors in traditional Chinese sentences written by students. We defined six types of errors which belong to two categories. Given a sentence, the system should detect where the errors are, and for each detected error determine its type and provide correction suggestions. We designed, constructed, and released a benchmark dataset for this task.