Huda Khayrallah


2020

pdf bib
The JHU Submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education
Huda Khayrallah | Jacob Bremerman | Arya D. McCarthy | Kenton Murray | Winston Wu | Matt Post
Proceedings of the Fourth Workshop on Neural Generation and Translation

This paper presents the Johns Hopkins University submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education (STAPLE). We participated in all five language tasks, placing first in each. Our approach involved a language-agnostic pipeline of three components: (1) building strong machine translation systems on general-domain data, (2) fine-tuning on Duolingo-provided data, and (3) generating n-best lists which are then filtered with various score-based techniques. In addi- tion to the language-agnostic pipeline, we attempted a number of linguistically-motivated approaches, with, unfortunately, little success. We also find that improving BLEU performance of the beam-search generated translation does not necessarily improve on the task metric—weighted macro F1 of an n-best list.

pdf bib
SMRT Chatbots: Improving Non-Task-Oriented Dialog with Simulated Multiple Reference Training
Huda Khayrallah | João Sedoc
Findings of the Association for Computational Linguistics: EMNLP 2020

Non-task-oriented dialog models suffer from poor quality and non-diverse responses. To overcome limited conversational data, we apply Simulated Multiple Reference Training (SMRT; Khayrallah et al., 2020), and use a paraphraser to simulate multiple responses per training prompt. We find SMRT improves over a strong Transformer baseline as measured by human and automatic quality scores and lexical diversity. We also find SMRT is comparable to pretraining in human evaluation quality, and outperforms pretraining on automatic quality and lexical diversity, without requiring related-domain dialog data.

pdf bib
Simulated multiple reference training improves low-resource machine translation
Huda Khayrallah | Brian Thompson | Matt Post | Philipp Koehn
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Many valid translations exist for a given sentence, yet machine translation (MT) is trained with a single reference translation, exacerbating data sparsity in low-resource settings. We introduce Simulated Multiple Reference Training (SMRT), a novel MT training method that approximates the full space of possible translations by sampling a paraphrase of the reference sentence from a paraphraser and training the MT model to predict the paraphraser’s distribution over possible tokens. We demonstrate the effectiveness of SMRT in low-resource settings when translating to English, with improvements of 1.2 to 7.0 BLEU. We also find SMRT is complementary to back-translation.

pdf bib
On the Evaluation of Machine Translation n-best Lists
Jacob Bremerman | Huda Khayrallah | Douglas Oard | Matt Post
Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems

The standard machine translation evaluation framework measures the single-best output of machine translation systems. There are, however, many situations where n-best lists are needed, yet there is no established way of evaluating them. This paper establishes a framework for addressing n-best evaluation by outlining three different questions one could consider when determining how one would define a ‘good’ n-best list and proposing evaluation measures for each question. The first and principal contribution is an evaluation measure that characterizes the translation quality of an entire n-best list by asking whether many of the valid translations are placed near the top of the list. The second is a measure that uses gold translations with preference annotations to ask to what degree systems can produce ranked lists in preference order. The third is a measure that rewards partial matches, evaluating the closeness of the many items in an n-best list to a set of many valid references. These three perspectives make clear that having access to many references can be useful when n-best evaluation is the goal.

2019

pdf bib
HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation
Brian Thompson | Rebecca Knowles | Xuan Zhang | Huda Khayrallah | Kevin Duh | Philipp Koehn
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Bilingual lexicons are valuable resources used by professional human translators. While these resources can be easily incorporated in statistical machine translation, it is unclear how to best do so in the neural framework. In this work, we present the HABLex dataset, designed to test methods for bilingual lexicon integration into neural machine translation. Our data consists of human generated alignments of words and phrases in machine translation test sets in three language pairs (Russian-English, Chinese-English, and Korean-English), resulting in clean bilingual lexicons which are well matched to the reference. We also present two simple baselines - constrained decoding and continued training - and an improvement to continued training to address overfitting.

pdf bib
Deep Generalized Canonical Correlation Analysis
Adrian Benton | Huda Khayrallah | Biman Gujral | Dee Ann Reisinger | Sheng Zhang | Raman Arora
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

We present Deep Generalized Canonical Correlation Analysis (DGCCA) – a method for learning nonlinear transformations of arbitrarily many views of data, such that the resulting transformations are maximally informative of each other. While methods for nonlinear two view representation learning (Deep CCA, (Andrew et al., 2013)) and linear many-view representation learning (Generalized CCA (Horst, 1961)) exist, DGCCA combines the flexibility of nonlinear (deep) representation learning with the statistical power of incorporating information from many sources, or views. We present the DGCCA formulation as well as an efficient stochastic optimization algorithm for solving it. We learn and evaluate DGCCA representations for three downstream tasks: phonetic transcription from acoustic & articulatory measurements, recommending hashtags and recommending friends on a dataset of Twitter users.

pdf bib
Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting
J. Edward Hu | Huda Khayrallah | Ryan Culkin | Patrick Xia | Tongfei Chen | Matt Post | Benjamin Van Durme
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Lexically-constrained sequence decoding allows for explicit positive or negative phrase-based constraints to be placed on target output strings in generation tasks such as machine translation or monolingual text rewriting. We describe vectorized dynamic beam allocation, which extends work in lexically-constrained decoding to work with batching, leading to a five-fold improvement in throughput when working with positive constraints. Faster decoding enables faster exploration of constraint strategies: we illustrate this via data augmentation experiments with a monolingual rewriter applied to the tasks of natural language inference, question answering and machine translation, showing improvements in all three.

pdf bib
Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation
Brian Thompson | Jeremy Gwinnup | Huda Khayrallah | Kevin Duh | Philipp Koehn
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Continued training is an effective method for domain adaptation in neural machine translation. However, in-domain gains from adaptation come at the expense of general-domain performance. In this work, we interpret the drop in general-domain performance as catastrophic forgetting of general-domain knowledge. To mitigate it, we adapt Elastic Weight Consolidation (EWC)—a machine learning method for learning a new task without forgetting previous tasks. Our method retains the majority of general-domain performance lost in continued training without degrading in-domain performance, outperforming the previous state-of-the-art. We also explore the full range of general-domain performance available when some in-domain degradation is acceptable.

2018

pdf bib
Improving Low Resource Machine Translation using Morphological Glosses (Non-archival Extended Abstract)
Steven Shearing | Christo Kirov | Huda Khayrallah | David Yarowsky
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

pdf bib
Regularized Training Objective for Continued Training for Domain Adaptation in Neural Machine Translation
Huda Khayrallah | Brian Thompson | Kevin Duh | Philipp Koehn
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

Supervised domain adaptation—where a large generic corpus and a smaller in-domain corpus are both available for training—is a challenge for neural machine translation (NMT). Standard practice is to train a generic model and use it to initialize a second model, then continue training the second model on in-domain data to produce an in-domain model. We add an auxiliary term to the training objective during continued training that minimizes the cross entropy between the in-domain model’s output word distribution and that of the out-of-domain model to prevent the model’s output from differing too much from the original out-of-domain model. We perform experiments on EMEA (descriptions of medicines) and TED (rehearsed presentations), initialized from a general domain (WMT) model. Our method shows improvements over standard continued training by up to 1.5 BLEU.

pdf bib
On the Impact of Various Types of Noise on Neural Machine Translation
Huda Khayrallah | Philipp Koehn
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

We examine how various types of noise in the parallel training data impact the quality of neural machine translation systems. We create five types of artificial noise and analyze how they degrade performance in neural and statistical machine translation. We find that neural models are generally more harmed by noise than statistical models. For one especially egregious type of noise they learn to just copy the input sentence.

pdf bib
Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation
Brian Thompson | Huda Khayrallah | Antonios Anastasopoulos | Arya D. McCarthy | Kevin Duh | Rebecca Marvin | Paul McNamee | Jeremy Gwinnup | Tim Anderson | Philipp Koehn
Proceedings of the Third Conference on Machine Translation: Research Papers

To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component’s contribution to, and capacity for, domain adaptation. We find that freezing any single component during continued training has minimal impact on performance, and that performance is surprisingly good when a single component is adapted while holding the rest of the model fixed. We also find that continued training does not move the model very far from the out-of-domain model, compared to a sensitivity analysis metric, suggesting that the out-of-domain model can provide a good generic initialization for the new domain.

pdf bib
Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering
Philipp Koehn | Huda Khayrallah | Kenneth Heafield | Mikel L. Forcada
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We posed the shared task of assigning sentence-level quality scores for a very noisy corpus of sentence pairs crawled from the web, with the goal of sub-selecting 1% and 10% of high-quality data to be used to train machine translation systems. Seventeen participants from companies, national research labs, and universities participated in this task.

pdf bib
The JHU Parallel Corpus Filtering Systems for WMT 2018
Huda Khayrallah | Hainan Xu | Philipp Koehn
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This work describes our submission to the WMT18 Parallel Corpus Filtering shared task. We use a slightly modified version of the Zipporah Corpus Filtering toolkit (Xu and Koehn, 2017), which computes an adequacy score and a fluency score on a sentence pair, and use a weighted sum of the scores as the selection criteria. This work differs from Zipporah in that we experiment with using the noisy corpus to be filtered to compute the combination weights, and thus avoids generating synthetic data as in standard Zipporah.

2017

pdf bib
Neural Lattice Search for Domain Adaptation in Machine Translation
Huda Khayrallah | Gaurav Kumar | Kevin Duh | Matt Post | Philipp Koehn
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Domain adaptation is a major challenge for neural machine translation (NMT). Given unknown words or new domains, NMT systems tend to generate fluent translations at the expense of adequacy. We present a stack-based lattice search algorithm for NMT and show that constraining its search space with lattices generated by phrase-based machine translation (PBMT) improves robustness. We report consistent BLEU score gains across four diverse domain adaptation tasks involving medical, IT, Koran, or subtitles texts.

pdf bib
The JHU Machine Translation Systems for WMT 2017
Shuoyang Ding | Huda Khayrallah | Philipp Koehn | Matt Post | Gaurav Kumar | Kevin Duh
Proceedings of the Second Conference on Machine Translation

pdf bib
Paradigm Completion for Derivational Morphology
Ryan Cotterell | Ekaterina Vylomova | Huda Khayrallah | Christo Kirov | David Yarowsky
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

The generation of complex derived word forms has been an overlooked problem in NLP; we fill this gap by applying neural sequence-to-sequence models to the task. We overview the theoretical motivation for a paradigmatic treatment of derivational morphology, and introduce the task of derivational paradigm completion as a parallel to inflectional paradigm completion. State-of-the-art neural models adapted from the inflection task are able to learn the range of derivation patterns, and outperform a non-neural baseline by 16.4%. However, due to semantic, historical, and lexical considerations involved in derivational morphology, future work will be needed to achieve performance parity with inflection-generating systems.

2016

pdf bib
The JHU Machine Translation Systems for WMT 2016
Shuoyang Ding | Kevin Duh | Huda Khayrallah | Philipp Koehn | Matt Post
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers