N-gram and Neural Language Models for Discriminating Similar Languages

Andre Cianflone, Leila Kosseim


Abstract
This paper describes our submission to the 2016 Discriminating Similar Languages (DSL) Shared Task. We participated in the closed Sub-task 1 with two separate machine learning techniques. The first approach is a character based Convolution Neural Network with an LSTM layer (CLSTM), which achieved an accuracy of 78.45% with minimal tuning. The second approach is a character-based n-gram model of size 7. It achieved an accuracy of 88.45% which is close to the accuracy of 89.38% achieved by the best submission.
Anthology ID:
W16-4831
Volume:
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
Month:
December
Year:
2016
Address:
Osaka, Japan
Venues:
VarDial | WS
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
243–250
Language:
URL:
https://www.aclweb.org/anthology/W16-4831
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W16-4831.pdf