Alexei Baevski


pdf bib
Cloze-driven Pretraining of Self-attention Networks
Alexei Baevski | Sergey Edunov | Yinhan Liu | Luke Zettlemoyer | Michael Auli
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We present a new approach for pretraining a bi-directional transformer model that provides significant performance gains across a variety of language understanding problems. Our model solves a cloze-style word reconstruction task, where each word is ablated and must be predicted given the rest of the text. Experiments demonstrate large performance gains on GLUE and new state of the art results on NER as well as constituency parsing benchmarks, consistent with BERT. We also present a detailed analysis of a number of factors that contribute to effective pretraining, including data domain and size, model capacity, and variations on the cloze objective.

pdf bib
Facebook FAIR’s WMT19 News Translation Task Submission
Nathan Ng | Kyra Yee | Alexei Baevski | Myle Ott | Michael Auli | Sergey Edunov
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes Facebook FAIR’s submission to the WMT19 shared news translation task. We participate in four language directions, English <-> German and English <-> Russian in both directions. Following our submission from last year, our baseline systems are large BPE-based transformer models trained with the FAIRSEQ sequence modeling toolkit. This year we experiment with different bitext data filtering schemes, as well as with adding filtered back-translated data. We also ensemble and fine-tune our models on domain-specific data, then decode using noisy channel model reranking. Our system improves on our previous system’s performance by 4.5 BLEU points and achieves the best case-sensitive BLEU score for the translation direction English→Russian.

pdf bib
Pre-trained language model representations for language generation
Sergey Edunov | Alexei Baevski | Michael Auli
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Pre-trained language model representations have been successful in a wide range of language understanding tasks. In this paper, we examine different strategies to integrate pre-trained representations into sequence to sequence models and apply it to neural machine translation and abstractive summarization. We find that pre-trained representations are most effective when added to the encoder network which slows inference by only 14%. Our experiments in machine translation show gains of up to 5.3 BLEU in a simulated resource-poor setup. While returns diminish with more labeled data, we still observe improvements when millions of sentence-pairs are available. Finally, on abstractive summarization we achieve a new state of the art on the full text version of CNN/DailyMail.

pdf bib
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott | Sergey Edunov | Alexei Baevski | Angela Fan | Sam Gross | Nathan Ng | David Grangier | Michael Auli
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and supports distributed training across multiple GPUs and machines. We also support fast mixed-precision training and inference on modern GPUs. A demo video can be found at