Ekaterina Vylomova


pdf bib
UniMorph 3.0: Universal Morphology
Arya D. McCarthy | Christo Kirov | Matteo Grella | Amrit Nidhi | Patrick Xia | Kyle Gorman | Ekaterina Vylomova | Sabrina J. Mielke | Garrett Nicolai | Miikka Silfverberg | Timofey Arkhangelskiy | Nataly Krizhanovsky | Andrew Krizhanovsky | Elena Klyachko | Alexey Sorokin | John Mansfield | Valts Ernštreits | Yuval Pinter | Cassandra L. Jacobs | Ryan Cotterell | Mans Hulden | David Yarowsky
Proceedings of the 12th Language Resources and Evaluation Conference

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological paradigms for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. We have implemented several improvements to the extraction pipeline which creates most of our data, so that it is both more complete and more correct. We have added 66 new languages, as well as new parts of speech for 12 languages. We have also amended the schema in several ways. Finally, we present three new community tools: two to validate data for resource creators, and one to make morphological data available from the command line. UniMorph is based at the Center for Language and Speech Processing (CLSP) at Johns Hopkins University in Baltimore, Maryland. This paper details advances made to the schema, tooling, and dissemination of project resources since the UniMorph 2.0 release described at LREC 2018.

pdf bib
Proceedings of the Second Workshop on Computational Research in Linguistic Typology
Ekaterina Vylomova | Edoardo M. Ponti | Eitan Grossman | Arya D. McCarthy | Yevgeni Berzak | Haim Dubossarsky | Ivan Vulić | Roi Reichart | Anna Korhonen | Ryan Cotterell
Proceedings of the Second Workshop on Computational Research in Linguistic Typology

pdf bib
SIGTYP 2020 Shared Task: Prediction of Typological Features
Johannes Bjerva | Elizabeth Salesky | Sabrina J. Mielke | Aditi Chaudhary | Celano Giuseppe | Edoardo Maria Ponti | Ekaterina Vylomova | Ryan Cotterell | Isabelle Augenstein
Proceedings of the Second Workshop on Computational Research in Linguistic Typology

Typological knowledge bases (KBs) such as WALS (Dryer and Haspelmath, 2013) contain information about linguistic properties of the world’s languages. They have been shown to be useful for downstream applications, including cross-lingual transfer learning and linguistic probing. A major drawback hampering broader adoption of typological KBs is that they are sparsely populated, in the sense that most languages only have annotations for some features, and skewed, in that few features have wide coverage. As typological features often correlate with one another, it is possible to predict them and thus automatically populate typological KBs, which is also the focus of this shared task. Overall, the task attracted 8 submissions from 5 teams, out of which the most successful methods make use of such feature correlations. However, our error analysis reveals that even the strongest submitted systems struggle with predicting feature values for languages where few features are known.

pdf bib
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
Ekaterina Vylomova | Jennifer White | Elizabeth Salesky | Sabrina J. Mielke | Shijie Wu | Edoardo Maria Ponti | Rowan Hall Maudslay | Ran Zmigrod | Josef Valvoda | Svetlana Toldova | Francis Tyers | Elena Klyachko | Ilya Yegorov | Natalia Krizhanovsky | Paula Czarnowska | Irene Nikkarinen | Andrew Krizhanovsky | Tiago Pimentel | Lucas Torroba Hennigen | Christo Kirov | Garrett Nicolai | Adina Williams | Antonios Anastasopoulos | Hilaria Cruz | Eleanor Chodroff | Ryan Cotterell | Miikka Silfverberg | Mans Hulden
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language. Most systems, however, are developed using data from just one language such as English. The SIGMORPHON 2020 shared task on morphological reinflection aims to investigate systems’ ability to generalize across typologically distinct languages, many of which are low resource. Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages. A total of 22 systems (19 neural) from 10 teams were submitted to the task. All four winning systems were neural (two monolingual transformers and two massively multilingual RNN-based models with gated attention). Most teams demonstrate utility of data hallucination and augmentation, ensembles, and multilingual training for low-resource languages. Non-neural learners and manually designed grammars showed competitive and even superior performance on some languages (such as Ingrian, Tajik, Tagalog, Zarma, Lingala), especially with very limited data. Some language families (Afro-Asiatic, Niger-Congo, Turkic) were relatively easy for most systems and achieved over 90% mean accuracy while others were more challenging.


pdf bib
Modelling Tibetan Verbal Morphology
Qianji Di | Ekaterina Vylomova | Tim Baldwin
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association

pdf bib
A string-to-graph constructive alignment algorithm for discrete and probabilistic language modeling
Andrey Shcherbakov | Ekaterina Vylomova
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association

pdf bib
Weird Inflects but OK: Making Sense of Morphological Generation Errors
Kyle Gorman | Arya D. McCarthy | Ryan Cotterell | Ekaterina Vylomova | Miikka Silfverberg | Magdalena Markowska
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

We conduct a manual error analysis of the CoNLL-SIGMORPHON Shared Task on Morphological Reinflection. This task involves natural language generation: systems are given a word in citation form (e.g., hug) and asked to produce the corresponding inflected form (e.g., the simple past hugged). We propose an error taxonomy and use it to annotate errors made by the top two systems across twelve languages. Many of the observed errors are related to inflectional patterns sensitive to inherent linguistic properties such as animacy or affect; many others are failures to predict truly unpredictable inflectional behaviors. We also find nearly one quarter of the residual “errors” reflect errors in the gold data.

pdf bib
The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection
Arya D. McCarthy | Ekaterina Vylomova | Shijie Wu | Chaitanya Malaviya | Lawrence Wolf-Sonkin | Garrett Nicolai | Christo Kirov | Miikka Silfverberg | Sabrina J. Mielke | Jeffrey Heinz | Ryan Cotterell | Mans Hulden
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages. The first task evolves past years’ inflection tasks by examining transfer of morphological inflection knowledge from a high-resource language to a low-resource language. This year also presents a new second challenge on lemmatization and morphological feature analysis in context. All submissions featured a neural component and built on either this year’s strong baselines or highly ranked systems from previous years’ shared tasks. Every participating team improved in accuracy over the baselines for the inflection task (though not Levenshtein distance), and every team in the contextual analysis task improved on both state-of-the-art neural and non-neural baselines.

pdf bib
Evaluation of Semantic Change of Harm-Related Concepts in Psychology
Ekaterina Vylomova | Sean Murphy | Nicholas Haslam
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

The paper focuses on diachronic evaluation of semantic changes of harm-related concepts in psychology. More specifically, we investigate a hypothesis that certain concepts such as “addiction”, “bullying”, “harassment”, “prejudice”, and “trauma” became broader during the last four decades. We evaluate semantic changes using two models: an LSA-based model from Sagi et al. (2009) and a diachronic adaptation of word2vec from Hamilton et al. (2016), that are trained on a large corpus of journal abstracts covering the period of 1980– 2019. Several concepts showed evidence of broadening. “Addiction” moved from physiological dependency on a substance to include psychological dependency on gaming and the Internet. Similarly, “harassment” and “trauma” shifted towards more psychological meanings. On the other hand, “bullying” has transformed into a more victim-related concept and expanded to new areas such as workplaces.

pdf bib
Proceedings of TyP-NLP: The First Workshop on Typology for Polyglot NLP
Haim Dubossarsky | Arya D. McCarthy | Edoardo Maria Ponti | Ivan Vulić | Ekaterina Vylomova | Yevgeni Berzak | Ryan Cotterell | Manaal Faruqui | Anna Korhonen | Roi Reichart
Proceedings of TyP-NLP: The First Workshop on Typology for Polyglot NLP

pdf bib
Contextualization of Morphological Inflection
Ekaterina Vylomova | Ryan Cotterell | Trevor Cohn | Timothy Baldwin | Jason Eisner
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Critical to natural language generation is the production of correctly inflected text. In this paper, we isolate the task of predicting a fully inflected sentence from its partially lemmatized version. Unlike traditional morphological inflection or surface realization, our task input does not provide “gold” tags that specify what morphological features to realize on each lemmatized word; rather, such features must be inferred from sentential context. We develop a neural hybrid graphical model that explicitly reconstructs morphological features before predicting the inflected forms, and compare this to a system that directly predicts the inflected forms without relying on any morphological annotation. We experiment on several typologically diverse languages from the Universal Dependencies treebanks, showing the utility of incorporating linguistically-motivated latent variables into NLP models.


pdf bib
UniMorph 2.0: Universal Morphology
Christo Kirov | Ryan Cotterell | John Sylak-Glassman | Géraldine Walther | Ekaterina Vylomova | Patrick Xia | Manaal Faruqui | Sabrina J. Mielke | Arya McCarthy | Sandra Kübler | David Yarowsky | Jason Eisner | Mans Hulden
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
The CoNLLSIGMORPHON 2018 Shared Task: Universal Morphological Reinflection
Ryan Cotterell | Christo Kirov | John Sylak-Glassman | Géraldine Walther | Ekaterina Vylomova | Arya D. McCarthy | Katharina Kann | Sabrina J. Mielke | Garrett Nicolai | Miikka Silfverberg | David Yarowsky | Jason Eisner | Mans Hulden
Proceedings of the CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection


pdf bib
CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection in 52 Languages
Ryan Cotterell | Christo Kirov | John Sylak-Glassman | Géraldine Walther | Ekaterina Vylomova | Patrick Xia | Manaal Faruqui | Sandra Kübler | David Yarowsky | Jason Eisner | Mans Hulden
Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection

pdf bib
Context-Aware Prediction of Derivational Word-forms
Ekaterina Vylomova | Ryan Cotterell | Timothy Baldwin | Trevor Cohn
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Derivational morphology is a fundamental and complex characteristic of language. In this paper we propose a new task of predicting the derivational form of a given base-form lemma that is appropriate for a given context. We present an encoder-decoder style neural network to produce a derived form character-by-character, based on its corresponding character-level representation of the base form and the context. We demonstrate that our model is able to generate valid context-sensitive derivations from known base forms, but is less accurate under lexicon agnostic setting.

pdf bib
Word Representation Models for Morphologically Rich Languages in Neural Machine Translation
Ekaterina Vylomova | Trevor Cohn | Xuanli He | Gholamreza Haffari
Proceedings of the First Workshop on Subword and Character Level Models in NLP

Out-of-vocabulary words present a great challenge for Machine Translation. Recently various character-level compositional models were proposed to address this issue. In current research we incorporate two most popular neural architectures, namely LSTM and CNN, into hard- and soft-attentional models of translation for character-level representation of the source. We propose semantic and morphological intrinsic evaluation of encoder-level representations. Our analysis of the learned representations reveals that character-based LSTM seems to be better at capturing morphological aspects compared to character-based CNN. We also show that hard-attentional model provides better character-level representations compared to vanilla one.

pdf bib
Paradigm Completion for Derivational Morphology
Ryan Cotterell | Ekaterina Vylomova | Huda Khayrallah | Christo Kirov | David Yarowsky
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

The generation of complex derived word forms has been an overlooked problem in NLP; we fill this gap by applying neural sequence-to-sequence models to the task. We overview the theoretical motivation for a paradigmatic treatment of derivational morphology, and introduce the task of derivational paradigm completion as a parallel to inflectional paradigm completion. State-of-the-art neural models adapted from the inflection task are able to learn the range of derivation patterns, and outperform a non-neural baseline by 16.4%. However, due to semantic, historical, and lexical considerations involved in derivational morphology, future work will be needed to achieve performance parity with inflection-generating systems.


pdf bib
Phonotactic Modeling of Extremely Low Resource Languages
Andrei Shcherbakov | Ekaterina Vylomova | Nick Thieberger
Proceedings of the Australasian Language Technology Association Workshop 2016

pdf bib
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning
Ekaterina Vylomova | Laura Rimell | Trevor Cohn | Timothy Baldwin
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
VectorWeavers at SemEval-2016 Task 10: From Incremental Meaning to Semantic Unit (phrase by phrase)
Andreas Scherbakov | Ekaterina Vylomova | Fei Liu | Timothy Baldwin
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)


pdf bib
Classifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions
Jing Peng | Anna Feldman | Ekaterina Vylomova
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)