Annotation Issues in Universal Dependencies for Korean and Japanese

Ji Yoon Han, Tae Hwan Oh, Lee Jin, Hansaem Kim


Abstract
To investigate issues that arise in the process of developing a Universal Dependency (UD) treebank for Korean and Japanese, we begin by addressing the typological characteristics of Korean and Japanese. Both Korean and Japanese are agglutinative and head-final languages. And the principle of word segmentation for both languages is different from English, which makes it difficult to apply UD guidelines. Following the typological characteristics of the two languages and the issue of UD application, we review the application of UPOS and DEPREL schemes to the two languages. The annotation principles for AUX, ADJ, DET, ADP and PART are discussed for the UPOS scheme, and the annotation principles for case, aux, iobj, and obl are discussed for the DEPREL scheme.
Anthology ID:
2020.udw-1.12
Volume:
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venues:
COLING | UDW
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
99–108
Language:
URL:
https://www.aclweb.org/anthology/2020.udw-1.12
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.udw-1.12.pdf