Consistent Word Segmentation, Part-of-Speech Tagging and Dependency Labelling Annotation for Chinese Language

Mo Shen, Wingmui Li, HyunJeong Choe, Chenhui Chu, Daisuke Kawahara, Sadao Kurohashi


Abstract
In this paper, we propose a new annotation approach to Chinese word segmentation, part-of-speech (POS) tagging and dependency labelling that aims to overcome the two major issues in traditional morphology-based annotation: Inconsistency and data sparsity. We re-annotate the Penn Chinese Treebank 5.0 (CTB5) and demonstrate the advantages of this approach compared to the original CTB5 annotation through word segmentation, POS tagging and machine translation experiments.
Anthology ID:
C16-1029
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
298–308
Language:
URL:
https://www.aclweb.org/anthology/C16-1029
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/C16-1029.pdf