An Extensive Empirical Evaluation of Character-Based Morphological Tagging for 14 Languages

Georg Heigold, Guenter Neumann, Josef van Genabith


Abstract
This paper investigates neural character-based morphological tagging for languages with complex morphology and large tag sets. Character-based approaches are attractive as they can handle rarely- and unseen words gracefully. We evaluate on 14 languages and observe consistent gains over a state-of-the-art morphological tagger across all languages except for English and French, where we match the state-of-the-art. We compare two architectures for computing character-based word vectors using recurrent (RNN) and convolutional (CNN) nets. We show that the CNN based approach performs slightly worse and less consistently than the RNN based approach. Small but systematic gains are observed when combining the two architectures by ensembling.
Anthology ID:
E17-1048
Volume:
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
Month:
April
Year:
2017
Address:
Valencia, Spain
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
505–513
Language:
URL:
https://www.aclweb.org/anthology/E17-1048
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/E17-1048.pdf