Marc’Aurelio Ranzato


2020

pdf bib
On The Evaluation of Machine Translation Systems Trained With Back-Translation
Sergey Edunov | Myle Ott | Marc’Aurelio Ranzato | Michael Auli
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Back-translation is a widely used data augmentation technique which leverages target monolingual data. However, its effectiveness has been challenged since automatic metrics such as BLEU only show significant improvements for test examples where the source itself is a translation, or translationese. This is believed to be due to translationese inputs better matching the back-translated training data. In this work, we show that this conjecture is not empirically supported and that back-translation improves translation quality of both naturally occurring text as well as translationese according to professional human translators. We provide empirical evidence to support the view that back-translation is preferred by humans because it produces more fluent outputs. BLEU cannot capture human preferences because references are translationese when source sentences are natural text. We recommend complementing BLEU with a language model score to measure fluency.

2019

pdf bib
The FLORES Evaluation Datasets for Low-Resource Machine Translation: Nepali–English and Sinhala–English
Francisco Guzmán | Peng-Jen Chen | Myle Ott | Juan Pino | Guillaume Lample | Philipp Koehn | Vishrav Chaudhary | Marc’Aurelio Ranzato
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available. Besides the technical challenges of learning with limited supervision, it is difficult to evaluate methods trained on low-resource language pairs because of the lack of freely and publicly available benchmarks. In this work, we introduce the FLORES evaluation datasets for Nepali–English and Sinhala– English, based on sentences translated from Wikipedia. Compared to English, these are languages with very different morphology and syntax, for which little out-of-domain parallel data is available and for which relatively large amounts of monolingual data are freely available. We describe our process to collect and cross-check the quality of translations, and we report baseline performance using several learning settings: fully supervised, weakly supervised, semi-supervised, and fully unsupervised. Our experiments demonstrate that current state-of-the-art methods perform rather poorly on this benchmark, posing a challenge to the research community working on low-resource MT. Data and code to reproduce our experiments are available at https://github.com/facebookresearch/flores.

pdf bib
Facebook AI’s WAT19 Myanmar-English Translation Task Submission
Peng-Jen Chen | Jiajun Shen | Matthew Le | Vishrav Chaudhary | Ahmed El-Kishky | Guillaume Wenzek | Myle Ott | Marc’Aurelio Ranzato
Proceedings of the 6th Workshop on Asian Translation

This paper describes Facebook AI’s submission to the WAT 2019 Myanmar-English translation task. Our baseline systems are BPE-based transformer models. We explore methods to leverage monolingual data to improve generalization, including self-training, back-translation and their combination. We further improve results by using noisy channel re-ranking and ensembling. We demonstrate that these techniques can significantly improve not only a system trained with additional monolingual data, but even the baseline system trained exclusively on the provided small parallel dataset. Our system ranks first in both directions according to human evaluation and BLEU, with a gain of over 8 BLEU points above the second best system.

2018

pdf bib
Classical Structured Prediction Losses for Sequence to Sequence Learning
Sergey Edunov | Myle Ott | Michael Auli | David Grangier | Marc’Aurelio Ranzato
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam. In this paper, we survey a range of classical objective functions that have been widely used to train linear models for structured prediction and apply them to neural sequence to sequence models. Our experiments show that these losses can perform surprisingly well by slightly outperforming beam search optimization in a like for like setup. We also report new state of the art results on both IWSLT’14 German-English translation as well as Gigaword abstractive summarization. On the large WMT’14 English-French task, sequence-level training achieves 41.5 BLEU which is on par with the state of the art.

pdf bib
Phrase-Based & Neural Unsupervised Machine Translation
Guillaume Lample | Myle Ott | Alexis Conneau | Ludovic Denoyer | Marc’Aurelio Ranzato
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs. This work investigates how to learn to translate when having access to only large monolingual corpora in each language. We propose two model variants, a neural and a phrase-based model. Both versions leverage a careful initialization of the parameters, the denoising effect of language models and automatic generation of parallel data by iterative back-translation. These models are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters. On the widely used WMT’14 English-French and WMT’16 German-English benchmarks, our models respectively obtain 28.1 and 25.2 BLEU points without using a single parallel sentence, outperforming the state of the art by more than 11 BLEU points. On low-resource languages like English-Urdu and English-Romanian, our methods achieve even better results than semi-supervised and supervised approaches leveraging the paucity of available bitexts. Our code for NMT and PBSMT is publicly available.