Using Universal Dependencies in cross-linguistic complexity research

Aleksandrs Berdicevskis, Çağrı Çöltekin, Katharina Ehret, Kilu von Prince, Daniel Ross, Bill Thompson, Chunxiao Yan, Vera Demberg, Gary Lupyan, Taraka Rama, Christian Bentz


Abstract
We evaluate corpus-based measures of linguistic complexity obtained using Universal Dependencies (UD) treebanks. We propose a method of estimating robustness of the complexity values obtained using a given measure and a given treebank. The results indicate that measures of syntactic complexity might be on average less robust than those of morphological complexity. We also estimate the validity of complexity measures by comparing the results for very similar languages and checking for unexpected differences. We show that some of those differences that arise can be diminished by using parallel treebanks and, more importantly from the practical point of view, by harmonizing the language-specific solutions in the UD annotation.
Anthology ID:
W18-6002
Volume:
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)
Month:
November
Year:
2018
Address:
Brussels, Belgium
Venues:
EMNLP | UDW | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8–17
Language:
URL:
https://www.aclweb.org/anthology/W18-6002
DOI:
10.18653/v1/W18-6002
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W18-6002.pdf