Eleni Metheniti


pdf bib
Wikinflection Corpus: A (Better) Multilingual, Morpheme-Annotated Inflectional Corpus
Eleni Metheniti | Guenter Neumann
Proceedings of the 12th Language Resources and Evaluation Conference

Multilingual, inflectional corpora are a scarce resource in the NLP community, especially corpora with annotated morpheme boundaries. We are evaluating a generated, multilingual inflectional corpus with morpheme boundaries, generated from the English Wiktionary (Metheniti and Neumann, 2018), against the largest, multilingual, high-quality inflectional corpus of the UniMorph project (Kirov et al., 2018). We confirm that the generated Wikinflection corpus is not of such quality as UniMorph, but we were able to extract a significant amount of words from the intersection of the two corpora. Our Wikinflection corpus benefits from the morpheme segmentations of Wiktionary/Wikinflection and from the manually-evaluated morphological feature tags of the UniMorph project, and has 216K lemmas and 5.4M word forms, in a total of 68 languages.

pdf bib
How Relevant Are Selectional Preferences for Transformer-based Language Models?
Eleni Metheniti | Tim Van de Cruys | Nabil Hathout
Proceedings of the 28th International Conference on Computational Linguistics

Selectional preference is defined as the tendency of a predicate to favor particular arguments within a certain linguistic context, and likewise, reject others that result in conflicting or implausible meanings. The stellar success of contextual word embedding models such as BERT in NLP tasks has led many to question whether these models have learned linguistic information, but up till now, most research has focused on syntactic information. We investigate whether Bert contains information on the selectional preferences of words, by examining the probability it assigns to the dependent word given the presence of a head word in a sentence. We are using word pairs of head-dependent words in five different syntactic relations from the SP-10K corpus of selectional preference (Zhang et al., 2019b), in sentences from the ukWaC corpus, and we are calculating the correlation of the plausibility score (from SP-10K) and the model probabilities. Our results show that overall, there is no strong positive or negative correlation in any syntactic relation, but we do find that certain head words have a strong correlation and that masking all words but the head word yields the most positive correlations in most scenarios –which indicates that the semantics of the predicate is indeed an integral and influential factor for the selection of the argument.


pdf bib
Identifying Grammar Rules for Language Education with Dependency Parsing in German
Eleni Metheniti | Pomi Park | Kristina Kolesova | Günter Neumann
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)