Computationally efficient discrimination between language varieties with large feature vectors and regularized classifiers

Adrien Barbaresi


Abstract
The present contribution revolves around efficient approaches to language classification which have been field-tested in the Vardial evaluation campaign. The methods used in several language identification tasks comprising different language types are presented and their results are discussed, giving insights on real-world application of regularization, linear classifiers and corresponding linguistic features. The use of a specially adapted Ridge classifier proved useful in 2 tasks out of 3. The overall approach (XAC) has slightly outperformed most of the other systems on the DFS task (Dutch and Flemish) and on the ILI task (Indo-Aryan languages), while its comparative performance was poorer in on the GDI task (Swiss German dialects).
Anthology ID:
W18-3918
Volume:
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venues:
COLING | VarDial | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
164–171
Language:
URL:
https://www.aclweb.org/anthology/W18-3918
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W18-3918.pdf