Aaron Mueller


2020

pdf bib
Cross-Linguistic Syntactic Evaluation of Word Prediction Models
Aaron Mueller | Garrett Nicolai | Panayiota Petrou-Zeniou | Natalia Talmina | Tal Linzen
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

A range of studies have concluded that neural word prediction models can distinguish grammatical from ungrammatical sentences with high accuracy. However, these studies are based primarily on monolingual evidence from English. To investigate how these models’ ability to learn syntax varies by language, we introduce CLAMS (Cross-Linguistic Assessment of Models on Syntax), a syntactic evaluation suite for monolingual and multilingual models. CLAMS includes subject-verb agreement challenge sets for English, French, German, Hebrew and Russian, generated from grammars we develop. We use CLAMS to evaluate LSTM language models as well as monolingual and multilingual BERT. Across languages, monolingual LSTMs achieved high accuracy on dependencies without attractors, and generally poor accuracy on agreement across object relative clauses. On other constructions, agreement accuracy was generally higher in languages with richer morphology. Multilingual models generally underperformed monolingual models. Multilingual BERT showed high syntactic accuracy on English, but noticeable deficiencies in other languages.

pdf bib
The Johns Hopkins University Bible Corpus: 1600+ Tongues for Typological Exploration
Arya D. McCarthy | Rachel Wicks | Dylan Lewis | Aaron Mueller | Winston Wu | Oliver Adams | Garrett Nicolai | Matt Post | David Yarowsky
Proceedings of the 12th Language Resources and Evaluation Conference

We present findings from the creation of a massively parallel corpus in over 1600 languages, the Johns Hopkins University Bible Corpus (JHUBC). The corpus consists of over 4000 unique translations of the Christian Bible and counting. Our data is derived from scraping several online resources and merging them with existing corpora, combining them under a common scheme that is verse-parallel across all translations. We detail our effort to scrape, clean, align, and utilize this ripe multilingual dataset. The corpus captures the great typological variety of the world’s languages. We catalog this by showing highly similar proportions of representation of Ethnologue’s typological features in our corpus. We also give an example application: projecting pronoun features like clusivity across alignments to richly annotate languages which do not mark the distinction.

pdf bib
An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages
Aaron Mueller | Garrett Nicolai | Arya D. McCarthy | Dylan Lewis | Winston Wu | David Yarowsky
Proceedings of the 12th Language Resources and Evaluation Conference

In this work, we explore massively multilingual low-resource neural machine translation. Using translations of the Bible (which have parallel structure across languages), we train models with up to 1,107 source languages. We create various multilingual corpora, varying the number and relatedness of source languages. Using these, we investigate the best ways to use this many-way aligned resource for multilingual machine translation. Our experiments employ a grammatically and phylogenetically diverse set of source languages during testing for more representative evaluations. We find that best practices in this domain are highly language-specific: adding more languages to a training set is often better, but too many harms performance—the best number depends on the source language. Furthermore, training on related languages can improve or degrade performance, depending on the language. As there is no one-size-fits-most answer, we find that it is critical to tailor one’s approach to the source language and its typology.

pdf bib
Fine-grained Morphosyntactic Analysis and Generation Tools for More Than One Thousand Languages
Garrett Nicolai | Dylan Lewis | Arya D. McCarthy | Aaron Mueller | Winston Wu | David Yarowsky
Proceedings of the 12th Language Resources and Evaluation Conference

Exploiting the broad translation of the Bible into the world’s languages, we train and distribute morphosyntactic tools for approximately one thousand languages, vastly outstripping previous distributions of tools devoted to the processing of inflectional morphology. Evaluation of the tools on a subset of available inflectional dictionaries demonstrates strong initial models, supplemented and improved through ensembling and dictionary-based reranking. Likewise, a novel type-to-token based evaluation metric allows us to confirm that models generalize well across rare and common forms alike

2019

pdf bib
Modeling Color Terminology Across Thousands of Languages
Arya D. McCarthy | Winston Wu | Aaron Mueller | William Watson | David Yarowsky
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

There is an extensive history of scholarship into what constitutes a “basic” color term, as well as a broadly attested acquisition sequence of basic color terms across many languages, as articulated in the seminal work of Berlin and Kay (1969). This paper employs a set of diverse measures on massively cross-linguistic data to operationalize and critique the Berlin and Kay color term hypotheses. Collectively, the 14 empirically-grounded computational linguistic metrics we design—as well as their aggregation—correlate strongly with both the Berlin and Kay basic/secondary color term partition (γ = 0.96) and their hypothesized universal acquisition sequence. The measures and result provide further empirical evidence from computational linguistics in support of their claims, as well as additional nuance: they suggest treating the partition as a spectrum instead of a dichotomy.

pdf bib
Quantity doesn’t buy quality syntax with neural language models
Marten van Schijndel | Aaron Mueller | Tal Linzen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Recurrent neural networks can learn to predict upcoming words remarkably well on average; in syntactically complex contexts, however, they often assign unexpectedly high probabilities to ungrammatical words. We investigate to what extent these shortcomings can be mitigated by increasing the size of the network and the corpus on which it is trained. We find that gains from increasing network size are minimal beyond a certain point. Likewise, expanding the training corpus yields diminishing returns; we estimate that the training corpus would need to be unrealistically large for the models to match human performance. A comparison to GPT and BERT, Transformer-based models trained on billions of words, reveals that these models perform even more poorly than our LSTMs in some constructions. Our results make the case for more data efficient architectures.

pdf bib
Sentence-Level Adaptation for Low-Resource Neural Machine Translation
Aaron Mueller | Yash Kumar Lal
Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages