Markus Freitag


2020

pdf bib
Translationese as a Language in “Multilingual” NMT
Parker Riley | Isaac Caswell | Markus Freitag | David Grangier
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Machine translation has an undesirable propensity to produce “translationese” artifacts, which can lead to higher BLEU scores while being liked less by human raters. Motivated by this, we model translationese and original (i.e. natural) text as separate languages in a multilingual model, and pose the question: can we perform zero-shot translation between original source text and original target text? There is no data with original source and original target, so we train a sentence-level classifier to distinguish translationese from original target text, and use this classifier to tag the training data for an NMT model. Using this technique we bias the model to produce more natural outputs at test time, yielding gains in human evaluation scores on both accuracy and fluency. Additionally, we demonstrate that it is possible to bias the model to produce translationese and game the BLEU score, increasing it while decreasing human-rated quality. We analyze these outputs using metrics measuring the degree of translationese, and present an analysis of the volatility of heuristic-based train-data tagging.

pdf bib
KoBE: Knowledge-Based Machine Translation Evaluation
Zorik Gekhman | Roee Aharoni | Genady Beryozkin | Markus Freitag | Wolfgang Macherey
Findings of the Association for Computational Linguistics: EMNLP 2020

We propose a simple and effective method for machine translation evaluation which does not require reference translations. Our approach is based on (1) grounding the entity mentions found in each source sentence and candidate translation against a large-scale multilingual knowledge base, and (2) measuring the recall of the grounded entities found in the candidate vs. those found in the source. Our approach achieves the highest correlation with human judgements on 9 out of the 18 language pairs from the WMT19 benchmark for evaluation without references, which is the largest number of wins for a single evaluation method on this task. On 4 language pairs, we also achieve higher correlation with human judgements than BLEU. To foster further research, we release a dataset containing 1.8 million grounded entity mentions across 18 language pairs from the WMT19 metrics track data.

pdf bib
BLEU might be Guilty but References are not Innocent
Markus Freitag | David Grangier | Isaac Caswell
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The quality of automatic metrics for machine translation has been increasingly called into question, especially for high-quality systems. This paper demonstrates that, while choice of metric is important, the nature of the references is also critical. We study different methods to collect references and compare their value in automated evaluation by reporting correlation with human evaluation for a variety of systems and metrics. Motivated by the finding that typical references exhibit poor diversity, concentrating around translationese language, we develop a paraphrasing task for linguists to perform on existing reference translations, which counteracts this bias. Our method yields higher correlation with human judgment not only for the submissions of WMT 2019 English to German, but also for Back-translation and APE augmented MT output, which have been shown to have low correlation with automatic metrics using standard references. We demonstrate that our methodology improves correlation with all modern evaluation metrics we look at, including embedding-based methods.To complete this picture, we reveal that multi-reference BLEU does not improve the correlation for high quality output, and present an alternative multi-reference formulation that is more effective.

2019

pdf bib
APE at Scale and Its Implications on MT Evaluation Biases
Markus Freitag | Isaac Caswell | Scott Roy
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

In this work, we train an Automatic Post-Editing (APE) model and use it to reveal biases in standard MT evaluation procedures. The goal of our APE model is to correct typical errors introduced by the translation process, and convert the “translationese” output into natural text. Our APE model is trained entirely on monolingual data that has been round-trip translated through English, to mimic errors that are similar to the ones introduced by NMT. We apply our model to the output of existing NMT systems, and demonstrate that, while the human-judged quality improves in all cases, BLEU scores drop with forward-translated test sets. We verify these results for the WMT18 English to German, WMT15 English to French, and WMT16 English to Romanian tasks. Furthermore, we selectively apply our APE model on the output of the top submissions of the most recent WMT evaluation campaigns. We see quality improvements on all tasks of up to 2.5 BLEU points.

2018

pdf bib
Unsupervised Natural Language Generation with Denoising Autoencoders
Markus Freitag | Scott Roy
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Generating text from structured data is important for various tasks such as question answering and dialog systems. We show that in at least one domain, without any supervision and only based on unlabeled text, we are able to build a Natural Language Generation (NLG) system with higher performance than supervised approaches. In our approach, we interpret the structured data as a corrupt representation of the desired output and use a denoising auto-encoder to reconstruct the sentence. We show how to introduce noise into training examples that do not contain structured data, and that the resulting denoising auto-encoder generalizes to generate correct sentences when given structured data.

2017

pdf bib
Beam Search Strategies for Neural Machine Translation
Markus Freitag | Yaser Al-Onaizan
Proceedings of the First Workshop on Neural Machine Translation

The basic concept in Neural Machine Translation (NMT) is to train a large Neural Network that maximizes the translation performance on a given parallel corpus. NMT is then using a simple left-to-right beam-search decoder to generate new translations that approximately maximize the trained conditional probability. The current beam search strategy generates the target sentence word by word from left-to-right while keeping a fixed amount of active candidates at each time step. First, this simple search is less adaptive as it also expands candidates whose scores are much worse than the current best. Secondly, it does not expand hypotheses if they are not within the best scoring candidates, even if their scores are close to the best one. The latter one can be avoided by increasing the beam size until no performance improvement can be observed. While you can reach better performance, this has the drawback of a slower decoding speed. In this paper, we concentrate on speeding up the decoder by applying a more flexible beam search strategy whose candidate size may vary at each time step depending on the candidate scores. We speed up the original decoder by up to 43% for the two language pairs German to English and Chinese to English without losing any translation quality.

2015

pdf bib
Local System Voting Feature for Machine Translation System Combination
Markus Freitag | Jan-Thorsten Peter | Stephan Peitz | Minwei Feng | Hermann Ney
Proceedings of the Tenth Workshop on Statistical Machine Translation

2014

pdf bib
Jane: Open Source Machine Translation System Combination
Markus Freitag | Matthias Huck | Hermann Ney
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
EU-BRIDGE MT: Combined Machine Translation
Markus Freitag | Stephan Peitz | Joern Wuebker | Hermann Ney | Matthias Huck | Rico Sennrich | Nadir Durrani | Maria Nadejde | Philip Williams | Philipp Koehn | Teresa Herrmann | Eunah Cho | Alex Waibel
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
The RWTH Aachen German-English Machine Translation System for WMT 2014
Stephan Peitz | Joern Wuebker | Markus Freitag | Hermann Ney
Proceedings of the Ninth Workshop on Statistical Machine Translation

2013

pdf bib
A Performance Study of Cube Pruning for Large-Scale Hierarchical Machine Translation
Matthias Huck | David Vilar | Markus Freitag | Hermann Ney
Proceedings of the Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Joint WMT 2013 Submission of the QUAERO Project
Stephan Peitz | Saab Mansour | Matthias Huck | Markus Freitag | Hermann Ney | Eunah Cho | Teresa Herrmann | Mohammed Mediani | Jan Niehues | Alex Waibel | Alexander Allauzen | Quoc Khanh Do | Bianka Buschbeck | Tonio Wandmacher
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
The RWTH Aachen Machine Translation System for WMT 2013
Stephan Peitz | Saab Mansour | Jan-Thorsten Peter | Christoph Schmidt | Joern Wuebker | Matthias Huck | Markus Freitag | Hermann Ney
Proceedings of the Eighth Workshop on Statistical Machine Translation

2012

pdf bib
Discriminative Reordering Extensions for Hierarchical Phrase-Based Machine Translation
Matthias Huck | Stephan Peitz | Markus Freitag | Hermann Ney
Proceedings of the 16th Annual conference of the European Association for Machine Translation

pdf bib
Review of Hypothesis Alignment Algorithms for MT System Combination via Confusion Network Decoding
Antti-Veikko Rosti | Xiaodong He | Damianos Karakos | Gregor Leusch | Yuan Cao | Markus Freitag | Spyros Matsoukas | Hermann Ney | Jason Smith | Bing Zhang
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
The RWTH Aachen Machine Translation System for WMT 2012
Matthias Huck | Stephan Peitz | Markus Freitag | Malte Nuhn | Hermann Ney
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Joint WMT 2012 Submission of the QUAERO Project
Markus Freitag | Stephan Peitz | Matthias Huck | Hermann Ney | Jan Niehues | Teresa Herrmann | Alex Waibel | Hai-son Le | Thomas Lavergne | Alexandre Allauzen | Bianka Buschbeck | Josep Maria Crego | Jean Senellart
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation
Joern Wuebker | Matthias Huck | Stephan Peitz | Malte Nuhn | Markus Freitag | Jan-Thorsten Peter | Saab Mansour | Hermann Ney
Proceedings of COLING 2012: Demonstration Papers

2011

pdf bib
The RWTH System Combination System for WMT 2011
Gregor Leusch | Markus Freitag | Hermann Ney
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Joint WMT Submission of the QUAERO Project
Markus Freitag | Gregor Leusch | Joern Wuebker | Stephan Peitz | Hermann Ney | Teresa Herrmann | Jan Niehues | Alex Waibel | Alexandre Allauzen | Gilles Adda | Josep Maria Crego | Bianka Buschbeck | Tonio Wandmacher | Jean Senellart
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
The RWTH Aachen Machine Translation System for WMT 2011
Matthias Huck | Joern Wuebker | Christoph Schmidt | Markus Freitag | Stephan Peitz | Daniel Stein | Arnaud Dagnelies | Saab Mansour | Gregor Leusch | Hermann Ney
Proceedings of the Sixth Workshop on Statistical Machine Translation