Comparing CRF and LSTM performance on the task of morphosyntactic tagging of non-standard varieties of South Slavic languages

Nikola Ljubešić


Abstract
This paper presents two systems taking part in the Morphosyntactic Tagging of Tweets shared task on Slovene, Croatian and Serbian data, organized inside the VarDial Evaluation Campaign. While one system relies on the traditional method for sequence labeling (conditional random fields), the other relies on its neural alternative (bidirectional long short-term memory). We investigate the similarities and differences of these two approaches, showing that both methods yield very good and quite similar results, with the neural model outperforming the traditional one more as the level of non-standardness of the text increases. Through an error analysis we show that the neural system is better at long-range dependencies, while the traditional system excels and slightly outperforms the neural system at the local ones. We present in the paper new state-of-the-art results in morphosyntactic annotation of non-standard text for Slovene, Croatian and Serbian.
Anthology ID:
W18-3917
Volume:
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venues:
COLING | VarDial | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
156–163
Language:
URL:
https://www.aclweb.org/anthology/W18-3917
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W18-3917.pdf