Marcin Junczys-Dowmunt


2020

pdf bib
Proceedings of the Fourth Workshop on Neural Generation and Translation
Alexandra Birch | Andrew Finch | Hiroaki Hayashi | Kenneth Heafield | Marcin Junczys-Dowmunt | Ioannis Konstas | Xian Li | Graham Neubig | Yusuke Oda
Proceedings of the Fourth Workshop on Neural Generation and Translation

2019

pdf bib
Minimally-Augmented Grammatical Error Correction
Roman Grundkiewicz | Marcin Junczys-Dowmunt
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

There has been an increased interest in low-resource approaches to automatic grammatical error correction. We introduce Minimally-Augmented Grammatical Error Correction (MAGEC) that does not require any error-labelled data. Our unsupervised approach is based on a simple but effective synthetic error generation method based on confusion sets from inverted spell-checkers. In low-resource settings, we outperform the current state-of-the-art results for German and Russian GEC tasks by a large margin without using any real error-annotated training data. When combined with labelled data, our method can serve as an efficient pre-training technique

pdf bib
From Research to Production and Back: Ludicrously Fast Neural Machine Translation
Young Jin Kim | Marcin Junczys-Dowmunt | Hany Hassan | Alham Fikri Aji | Kenneth Heafield | Roman Grundkiewicz | Nikolay Bogoychev
Proceedings of the 3rd Workshop on Neural Generation and Translation

This paper describes the submissions of the “Marian” team to the WNGT 2019 efficiency shared task. Taking our dominating submissions to the previous edition of the shared task as a starting point, we develop improved teacher-student training via multi-agent dual-learning and noisy backward-forward translation for Transformer-based student models. For efficient CPU-based decoding, we propose pre-packed 8-bit matrix products, improved batched decoding, cache-friendly student architectures with parameter sharing and light-weight RNN-based decoder architectures. GPU-based decoding benefits from the same architecture changes, from pervasive 16-bit inference and concurrent streams. These modifications together with profiler-based C++ code optimization allow us to push the Pareto frontier established during the 2018 edition towards 24x (CPU) and 14x (GPU) faster models at comparable or higher BLEU values. Our fastest CPU model is more than 4x faster than last year’s fastest submission at more than 3 points higher BLEU. Our fastest GPU model at 1.5 seconds translation time is slightly faster than last year’s fastest RNN-based submissions, but outperforms them by more than 4 BLEU and 10 BLEU points respectively.

pdf bib
Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data
Roman Grundkiewicz | Marcin Junczys-Dowmunt | Kenneth Heafield
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

Considerable effort has been made to address the data sparsity problem in neural grammatical error correction. In this work, we propose a simple and surprisingly effective unsupervised synthetic error generation method based on confusion sets extracted from a spellchecker to increase the amount of training data. Synthetic data is used to pre-train a Transformer sequence-to-sequence model, which not only improves over a strong baseline trained on authentic error-annotated data, but also enables the development of a practical GEC system in a scenario where little genuine error-annotated data is available. The developed systems placed first in the BEA19 shared task, achieving 69.47 and 64.24 F0.5 in the restricted and low-resource tracks respectively, both on the W&I+LOCNESS test set. On the popular CoNLL 2014 test set, we report state-of-the-art results of 64.16 M² for the submitted system, and 61.30 M² for the constrained system trained on the NUCLE and Lang-8 data.

pdf bib
Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation
Marcin Junczys-Dowmunt
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes the Microsoft Translator submissions to the WMT19 news translation shared task for English-German. Our main focus is document-level neural machine translation with deep transformer models. We start with strong sentence-level baselines, trained on large-scale data created via data-filtering and noisy back-translation and find that back-translation seems to mainly help with translationese input. We explore fine-tuning techniques, deeper models and different ensembling strategies to counter these effects. Using document boundaries present in the authentic and synthetic parallel data, we create sequences of up to 1000 subword segments and train transformer translation models. We experiment with data augmentation techniques for the smaller authentic data with document-boundaries and for larger authentic data without boundaries. We further explore multi-task training for the incorporation of document-level source language monolingual data via the BERT-objective on the encoder and two-pass decoding for combinations of sentence-level and document-level systems. Based on preliminary human evaluation results, evaluators strongly prefer the document-level systems over our comparable sentence-level system. The document-level systems also seem to score higher than the human references in source-based direct assessment.

2018

pdf bib
Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task
Marcin Junczys-Dowmunt | Roman Grundkiewicz | Shubha Guha | Kenneth Heafield
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Previously, neural methods in grammatical error correction (GEC) did not reach state-of-the-art results compared to phrase-based statistical machine translation (SMT) baselines. We demonstrate parallels between neural GEC and low-resource neural MT and successfully adapt several methods from low-resource MT to neural GEC. We further establish guidelines for trustable results in neural GEC and propose a set of model-independent methods for neural GEC that can be easily applied in most GEC settings. Proposed methods include adding source-side noise, domain-adaptation techniques, a GEC-specific training-objective, transfer learning with monolingual data, and ensembling of independently trained GEC models and language models. The combined effects of these methods result in better than state-of-the-art neural GEC models that outperform previously best neural GEC systems by more than 10% M² on the CoNLL-2014 benchmark and 5.9% on the JFLEG test set. Non-neural state-of-the-art systems are outperformed by more than 2% on the CoNLL-2014 benchmark and by 4% on JFLEG.

pdf bib
Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation
Roman Grundkiewicz | Marcin Junczys-Dowmunt
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We combine two of the most popular approaches to automated Grammatical Error Correction (GEC): GEC based on Statistical Machine Translation (SMT) and GEC based on Neural Machine Translation (NMT). The hybrid system achieves new state-of-the-art results on the CoNLL-2014 and JFLEG benchmarks. This GEC system preserves the accuracy of SMT output and, at the same time, generates more fluent sentences as it typical for NMT. Our analysis shows that the created systems are closer to reaching human-level performance than any other GEC system reported so far.

pdf bib
Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation
Nikolay Bogoychev | Kenneth Heafield | Alham Fikri Aji | Marcin Junczys-Dowmunt
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In order to extract the best possible performance from asynchronous stochastic gradient descent one must increase the mini-batch size and scale the learning rate accordingly. In order to achieve further speedup we introduce a technique that delays gradient updates effectively increasing the mini-batch size. Unfortunately with the increase of mini-batch size we worsen the stale gradient problem in asynchronous stochastic gradient descent (SGD) which makes the model convergence poor. We introduce local optimizers which mitigate the stale gradient problem and together with fine tuning our momentum we are able to train a shallow machine translation system 27% faster than an optimized baseline with negligible penalty in BLEU.

pdf bib
Marian: Fast Neural Machine Translation in C++
Marcin Junczys-Dowmunt | Roman Grundkiewicz | Tomasz Dwojak | Hieu Hoang | Kenneth Heafield | Tom Neckermann | Frank Seide | Ulrich Germann | Alham Fikri Aji | Nikolay Bogoychev | André F. T. Martins | Alexandra Birch
Proceedings of ACL 2018, System Demonstrations

We present Marian, an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs. Marian is written entirely in C++. We describe the design of the encoder-decoder framework and demonstrate that a research-friendly toolkit can achieve high training and translation speed.

pdf bib
Are we experiencing the Golden Age of Automatic Post-Editing?
Marcin Junczys-Dowmunt
Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing

pdf bib
Marian: Cost-effective High-Quality Neural Machine Translation in C++
Marcin Junczys-Dowmunt | Kenneth Heafield | Hieu Hoang | Roman Grundkiewicz | Anthony Aue
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

This paper describes the submissions of the “Marian” team to the WNMT 2018 shared task. We investigate combinations of teacher-student training, low-precision matrix products, auto-tuning and other methods to optimize the Transformer model on GPU and CPU. By further integrating these methods with the new averaging attention networks, a recently introduced faster Transformer variant, we create a number of high-quality, high-performance models on the GPU and CPU, dominating the Pareto frontier for this shared task.

pdf bib
Microsoft’s Submission to the WMT2018 News Translation Task: How I Learned to Stop Worrying and Love the Data
Marcin Junczys-Dowmunt
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the Microsoft submission to the WMT2018 news translation shared task. We participated in one language direction – English-German. Our system follows current best-practice and combines state-of-the-art models with new data filtering (dual conditional cross-entropy filtering) and sentence weighting methods. We trained fairly standard Transformer-big models with an updated version of Edinburgh’s training scheme for WMT2017 and experimented with different filtering schemes for Paracrawl. According to automatic metrics (BLEU) we reached the highest score for this subtask with a nearly 2 BLEU point margin over the next strongest system. Based on human evaluation we ranked first among constrained systems. We believe this is mostly caused by our data filtering/weighting regime.

pdf bib
MS-UEdin Submission to the WMT2018 APE Shared Task: Dual-Source Transformer for Automatic Post-Editing
Marcin Junczys-Dowmunt | Roman Grundkiewicz
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the Microsoft and University of Edinburgh submission to the Automatic Post-editing shared task at WMT2018. Based on training data and systems from the WMT2017 shared task, we re-implement our own models from the last shared task and introduce improvements based on extensive parameter sharing. Next we experiment with our implementation of dual-source transformer models and data selection for the IT domain. Our submissions decisively wins the SMT post-editing sub-task establishing the new state-of-the-art and is a very close second (or equal, 16.46 vs 16.50 TER) in the NMT sub-task. Based on the rather weak results in the NMT sub-task, we hypothesize that neural-on-neural APE might not be actually useful.

pdf bib
Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora
Marcin Junczys-Dowmunt
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

In this work we introduce dual conditional cross-entropy filtering for noisy parallel data. For each sentence pair of the noisy parallel corpus we compute cross-entropy scores according to two inverse translation models trained on clean data. We penalize divergent cross-entropies and weigh the penalty by the cross-entropy average of both models. Sorting or thresholding according to these scores results in better subsets of parallel data. We achieve higher BLEU scores with models trained on parallel data filtered only from Paracrawl than with models trained on clean WMT data. We further evaluate our method in the context of the WMT2018 shared task on parallel corpus filtering and achieve the overall highest ranking scores of the shared task, scoring top in three out of four subtasks.

2017

pdf bib
Nematus: a Toolkit for Neural Machine Translation
Rico Sennrich | Orhan Firat | Kyunghyun Cho | Alexandra Birch | Barry Haddow | Julian Hitschler | Marcin Junczys-Dowmunt | Samuel Läubli | Antonio Valerio Miceli Barone | Jozef Mokry | Maria Nădejde
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

We present Nematus, a toolkit for Neural Machine Translation. The toolkit prioritizes high translation accuracy, usability, and extensibility. Nematus has been used to build top-performing submissions to shared translation tasks at WMT and IWSLT, and has been used to train systems for production environments.

pdf bib
The SUMMA Platform Prototype
Renars Liepins | Ulrich Germann | Guntis Barzdins | Alexandra Birch | Steve Renals | Susanne Weber | Peggy van der Kreeft | Hervé Bourlard | João Prieto | Ondřej Klejch | Peter Bell | Alexandros Lazaridis | Alfonso Mendes | Sebastian Riedel | Mariana S. C. Almeida | Pedro Balage | Shay B. Cohen | Tomasz Dwojak | Philip N. Garner | Andreas Giefer | Marcin Junczys-Dowmunt | Hina Imran | David Nogueira | Ahmed Ali | Sebastião Miranda | Andrei Popescu-Belis | Lesly Miculicich Werlen | Nikos Papasarantopoulos | Abiola Obamuyide | Clive Jones | Fahim Dalvi | Andreas Vlachos | Yang Wang | Sibo Tong | Rico Sennrich | Nikolaos Pappas | Shashi Narayan | Marco Damonte | Nadir Durrani | Sameer Khurana | Ahmed Abdelali | Hassan Sajjad | Stephan Vogel | David Sheppey | Chris Hernon | Jeff Mitchell
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring. The platform contains a rich suite of low-level and high-level natural language processing technologies: automatic speech recognition of broadcast media, machine translation, automated tagging and classification of named entities, semantic parsing to detect relationships between entities, and automatic construction / augmentation of factual knowledge bases. Implemented on the Docker platform, it can easily be deployed, customised, and scaled to large volumes of incoming media streams.

pdf bib
An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing
Marcin Junczys-Dowmunt | Roman Grundkiewicz
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this work, we explore multiple neural architectures adapted for the task of automatic post-editing of machine translation output. We focus on neural end-to-end models that combine both inputs mt (raw MT output) and src (source language input) in a single neural architecture, modeling {mt, src} → pe directly. Apart from that, we investigate the influence of hard-attention models which seem to be well-suited for monolingual tasks, as well as combinations of both ideas. We report results on data sets provided during the WMT-2016 shared task on automatic post-editing and can demonstrate that dual-attention models that incorporate all available data in the APE scenario in a single model improve on the best shared task system and on all other published results after the shared task. Dual-attention models that are combined with hard attention remain competitive despite applying fewer changes to the input.

pdf bib
Pushing the Limits of Translation Quality Estimation
André F. T. Martins | Marcin Junczys-Dowmunt | Fabio N. Kepler | Ramón Astudillo | Chris Hokamp | Roman Grundkiewicz
Transactions of the Association for Computational Linguistics, Volume 5

Translation quality estimation is a task of growing importance in NLP, due to its potential to reduce post-editing human effort in disruptive ways. However, this potential is currently limited by the relatively low accuracy of existing systems. In this paper, we achieve remarkable improvements by exploiting synergies between the related tasks of word-level quality estimation and automatic post-editing. First, we stack a new, carefully engineered, neural model into a rich feature-based word-level quality estimation system. Then, we use the output of an automatic post-editing system as an extra feature, obtaining striking results on WMT16: a word-level FMULT1 score of 57.47% (an absolute gain of +7.95% over the current state of the art), and a Pearson correlation score of 65.56% for sentence-level HTER prediction (an absolute gain of +13.36%).

pdf bib
Predicting Target Language CCG Supertags Improves Neural Machine Translation
Maria Nădejde | Siva Reddy | Rico Sennrich | Tomasz Dwojak | Marcin Junczys-Dowmunt | Philipp Koehn | Alexandra Birch
Proceedings of the Second Conference on Machine Translation

pdf bib
The AMU-UEdin Submission to the WMT 2017 Shared Task on Automatic Post-Editing
Marcin Junczys-Dowmunt | Marcin Junczys-Dowmunt
Proceedings of the Second Conference on Machine Translation

pdf bib
The AMU-UEdin Submission to the WMT 2017 Shared Task on Automatic Post-Editing
Marcin Junczys-Dowmunt | Marcin Junczys-Dowmunt
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
The United Nations Parallel Corpus v1.0
Michał Ziemski | Marcin Junczys-Dowmunt | Bruno Pouliquen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes the creation process and statistics of the official United Nations Parallel Corpus, the first parallel corpus composed from United Nations documents published by the original data creator. The parallel corpus presented consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish. The corpus is freely available for download under a liberal license. Apart from the pairwise aligned documents, a fully aligned subcorpus for the six official UN languages is distributed. We provide baseline BLEU scores of our Moses-based SMT systems trained with the full data of language pairs involving English and for all possible translation directions of the six-way subcorpus.

pdf bib
Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction
Marcin Junczys-Dowmunt | Roman Grundkiewicz
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Target-Side Context for Discriminative Models in Statistical Machine Translation
Aleš Tamchyna | Alexander Fraser | Ondřej Bojar | Marcin Junczys-Dowmunt
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
The AMU-UEDIN Submission to the WMT16 News Translation Task: Attention-based NMT Models as Feature Functions in Phrase-based SMT
Marcin Junczys-Dowmunt | Tomasz Dwojak | Rico Sennrich
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Log-linear Combinations of Monolingual and Bilingual Neural Machine Translation Models for Automatic Post-Editing
Marcin Junczys-Dowmunt | Roman Grundkiewicz
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf bib
Human Evaluation of Grammatical Error Correction Systems
Roman Grundkiewicz | Marcin Junczys-Dowmunt | Edward Gillian
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
SMT at the International Maritime Organization experiences with combining in-house corpus with more general corpus
Bruno Pouliquen | Marcin Junczys-Dowmunt | Blanca Pinero | Michał Ziemski
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
SMT at the International Maritime Organization: experiences with combining in-house corpora with out-of-domain corpora
Bruno Pouliquen | Marcin Junczys-Dowmunt | Blanca Pinero | Michal Ziemski
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf bib
The AMU System in the CoNLL-2014 Shared Task: Grammatical Error Correction by Data-Intensive and Feature-Rich Statistical Machine Translation
Marcin Junczys-Dowmunt | Roman Grundkiewicz
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
SMT of German patents at WIPO: decompounding and verb structure pre-reordering
Marcin Junczys-Dowmunt | Bruno Pouliquen
Proceedings of the 17th Annual conference of the European Association for Machine Translation

2012

pdf bib
A Phrase Table without Phrases: Rank Encoding for Better Phrase Table Compression
Marcin Junczys-Dowmunt
Proceedings of the 16th Annual conference of the European Association for Machine Translation