Consistent Word Segmentation, Part-of-Speech Tagging and Dependency Labelling Annotation for Chinese Language
Mo Shen, Wingmui Li, HyunJeong Choe, Chenhui Chu, Daisuke Kawahara, Sadao Kurohashi
Abstract
In this paper, we propose a new annotation approach to Chinese word segmentation, part-of-speech (POS) tagging and dependency labelling that aims to overcome the two major issues in traditional morphology-based annotation: Inconsistency and data sparsity. We re-annotate the Penn Chinese Treebank 5.0 (CTB5) and demonstrate the advantages of this approach compared to the original CTB5 annotation through word segmentation, POS tagging and machine translation experiments.- Anthology ID:
- C16-1029
- Volume:
- Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Venue:
- COLING
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 298–308
- Language:
- URL:
- https://www.aclweb.org/anthology/C16-1029
- DOI:
- PDF:
- http://aclanthology.lst.uni-saarland.de/C16-1029.pdf