Ling Liu


2020

pdf bib
Analogy Models for Neural Word Inflection
Ling Liu | Mans Hulden
Proceedings of the 28th International Conference on Computational Linguistics

Analogy is assumed to be the cognitive mechanism speakers resort to in order to inflect an unknown form of a lexeme based on knowledge of other words in a language. In this process, an analogy is formed between word forms within an inflectional paradigm but also across paradigms. As neural network models for inflection are typically trained only on lemma-target form pairs, we propose three new ways to provide neural models with additional source forms to strengthen analogy-formation, and compare our methods to other approaches in the literature. We show that the proposed methods of providing a Transformer sequence-to-sequence model with additional analogy sources in the input are consistently effective, and improve upon recent state-of-the-art results on 46 languages, particularly in low-resource settings. We also propose a method to combine the analogy-motivated approach with data hallucination or augmentation. We find that the two approaches are complementary to each other and combining the two approaches is especially helpful when the training data is extremely limited.

pdf bib
IGT2P: From Interlinear Glossed Texts to Paradigms
Sarah Moeller | Ling Liu | Changbing Yang | Katharina Kann | Mans Hulden
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

An intermediate step in the linguistic analysis of an under-documented language is to find and organize inflected forms that are attested in natural speech. From this data, linguists generate unseen inflected word forms in order to test hypotheses about the language’s inflectional patterns and to complete inflectional paradigm tables. To get the data linguists spend many hours manually creating interlinear glossed texts (IGTs). We introduce a new task that speeds this process and automatically generates new morphological resources for natural language processing systems: IGT-to-paradigms (IGT2P). IGT2P generates entire morphological paradigms from IGT input. We show that existing morphological reinflection models can solve the task with 21% to 64% accuracy, depending on the language. We further find that (i) having a language expert spend only a few hours cleaning the noisy IGT data improves performance by as much as 21 percentage points, and (ii) POS tags, which are generally considered a necessary part of NLP morphological reinflection input, have no effect on the accuracy of the models considered here.

pdf bib
Leveraging Principal Parts for Morphological Inflection
Ling Liu | Mans Hulden
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper presents the submission by the CU Ling team from the University of Colorado to SIGMORPHON 2020 shared task 0 on morphological inflection. The task is to generate the target inflected word form given a lemma form and a target morphosyntactic description. Our system uses the Transformer architecture. Our overall approach is to treat the morphological inflection task as a paradigm cell filling problem and to design the system to leverage principal parts information for better morphological inflection when the training data is limited. We train one model for each language separately without external data. The overall average performance of our submission ranks the first in both average accuracy and Levenshtein distance from the gold inflection among all submissions including those using external resources.

2018

pdf bib
Morphological Reinflection in Context: CU Boulder’s Submission to CoNLLSIGMORPHON 2018 Shared Task
Ling Liu | Ilamvazhuthy Subbiah | Adam Wiemerslage | Jonathan Lilley | Sarah Moeller
Proceedings of the CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

pdf bib
A Computational Model for the Linguistic Notion of Morphological Paradigm
Miikka Silfverberg | Ling Liu | Mans Hulden
Proceedings of the 27th International Conference on Computational Linguistics

In supervised learning of morphological patterns, the strategy of generalizing inflectional tables into more abstract paradigms through alignment of the longest common subsequence found in an inflection table has been proposed as an efficient method to deduce the inflectional behavior of unseen word forms. In this paper, we extend this notion of morphological ‘paradigm’ from earlier work and provide a formalization that more accurately matches linguist intuitions about what an inflectional paradigm is. Additionally, we propose and evaluate a mechanism for learning full human-readable paradigm specifications from incomplete data—a scenario when we only have access to a few inflected forms for each lexeme, and want to reconstruct the missing inflections as well as generalize and group the witnessed patterns into a model of more abstract paradigmatic behavior of lexemes.

2017

pdf bib
Data Augmentation for Morphological Reinflection
Miikka Silfverberg | Adam Wiemerslage | Ling Liu | Lingshuang Jack Mao
Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection

pdf bib
Evaluation of Finite State Morphological Analyzers Based on Paradigm Extraction from Wiktionary
Ling Liu | Mans Hulden
Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing (FSMNLP 2017)

2016

pdf bib
Morphological reinflection with conditional random fields and unsupervised features
Ling Liu | Lingshuang Jack Mao
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology