Linguistic classification: dealing jointly with irrelevance and inconsistency
Laura Franzoi | Andrea Sgarro | Anca Dinu | Liviu P. Dinu
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
In this paper, we present new methods for language classification which put to good use both syntax and fuzzy tools, and are capable of dealing with irrelevant linguistic features (i.e. features which should not contribute to the classification) and even inconsistent features (which do not make sense for specific languages). We introduce a metric distance, based on the generalized Steinhaus transform, which allows one to deal jointly with irrelevance and inconsistency. To evaluate our methods, we test them on a syntactic data set, due to the linguist G. Longobardi and his school. We obtain phylogenetic trees which sometimes outperform the ones obtained by Atkinson and Gray.