David Vilar


2020

pdf bib
The Sockeye 2 Neural Machine Translation Toolkit at AMTA 2020
Tobias Domhan | Michael Denkowski | David Vilar | Xing Niu | Felix Hieber | Kenneth Heafield
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

pdf bib
Sockeye 2: A Toolkit for Neural Machine Translation
Felix Hieber | Tobias Domhan | Michael Denkowski | David Vilar
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

We present Sockeye 2, a modernized and streamlined version of the Sockeye neural machine translation (NMT) toolkit. New features include a simplified code base through the use of MXNet’s Gluon API, a focus on state of the art model architectures, and distributed mixed precision training. These improvements result in faster training and inference, higher automatic metric scores, and a shorter path from research to production.

2019

pdf bib
Automatic error classification with multiple error labels
Maja Popovic | David Vilar
Proceedings of Machine Translation Summit XVII Volume 1: Research Track

2018

pdf bib
Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation
Matt Post | David Vilar
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

The end-to-end nature of neural machine translation (NMT) removes many ways of manually guiding the translation process that were available in older paradigms. Recent work, however, has introduced a new capability: lexically constrained or guided decoding, a modification to beam search that forces the inclusion of pre-specified words and phrases in the output. However, while theoretically sound, existing approaches have computational complexities that are either linear (Hokamp and Liu, 2017) or exponential (Anderson et al., 2017) in the number of constraints. We present a algorithm for lexically constrained decoding with a complexity of O(1) in the number of constraints. We demonstrate the algorithm’s remarkable ability to properly place these constraints, and use it to explore the shaky relationship between model and BLEU scores. Our implementation is available as part of Sockeye.

pdf bib
Learning Hidden Unit Contribution for Adapting Neural Machine Translation Models
David Vilar
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

In this paper we explore the use of Learning Hidden Unit Contribution for the task of neural machine translation. The method was initially proposed in the context of speech recognition for adapting a general system to the specific acoustic characteristics of each speaker. Similar in spirit, in a machine translation framework we want to adapt a general system to a specific domain. We show that the proposed method achieves improvements of up to 2.6 BLEU points over a general system, and up to 6 BLEU points if the initial system has only been trained on out-of-domain data, a situation which may easily happen in practice. The good performance together with its short training time and small memory footprint make it a very attractive solution for domain adaptation.

pdf bib
The Sockeye Neural Machine Translation Toolkit at AMTA 2018
Felix Hieber | Tobias Domhan | Michael Denkowski | David Vilar | Artem Sokolov | Ann Clifton | Matt Post
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

2014

pdf bib
Simple and Effective Approach for Consistent Training of Hierarchical Phrase-based Translation Models
Stephan Peitz | David Vilar | Hermann Ney
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
The tara corpus of human-annotated machine translations
Eleftherios Avramidis | Aljoscha Burchardt | Sabine Hunsicker | Maja Popović | Cindy Tscherwinka | David Vilar | Hans Uszkoreit
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Human translators are the key to evaluating machine translation (MT) quality and also to addressing the so far unanswered question when and how to use MT in professional translation workflows. This paper describes the corpus developed as a result of a detailed large scale human evaluation consisting of three tightly connected tasks: ranking, error classification and post-editing.

2013

pdf bib
A Performance Study of Cube Pruning for Large-Scale Hierarchical Machine Translation
Matthias Huck | David Vilar | Markus Freitag | Hermann Ney
Proceedings of the Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation

2012

pdf bib
Towards the Integration of MT into a LSP Translation Workflow
David Vilar | Michael Schneider | Aljoscha Burchardt | Thomas Wedde
Proceedings of the 16th Annual conference of the European Association for Machine Translation

pdf bib
Involving Language Professionals in the Evaluation of Machine Translation
Eleftherios Avramidis | Aljoscha Burchardt | Christian Federmann | Maja Popović | Cindy Tscherwinka | David Vilar
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Significant breakthroughs in machine translation only seem possible if human translators are taken into the loop. While automatic evaluation and scoring mechanisms such as BLEU have enabled the fast development of systems, it is not clear how systems can meet real-world (quality) requirements in industrial translation scenarios today. The taraXÜ project paves the way for wide usage of hybrid machine translation outputs through various feedback loops in system development. In a consortium of research and industry partners, the project integrates human translators into the development process for rating and post-editing of machine translation outputs thus collecting feedback for possible improvements.

pdf bib
DFKI’s SMT System for WMT 2012
David Vilar
Proceedings of the Seventh Workshop on Statistical Machine Translation

2011

pdf bib
Evaluate with Confidence Estimation: Machine ranking of translation outputs using grammatical features
Eleftherios Avramidis | Maja Popovic | David Vilar | Aljoscha Burchardt
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Evaluation without references: IBM1 scores as evaluation metrics
Maja Popović | David Vilar | Eleftherios Avramidis | Aljoscha Burchardt
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
DFKI Hybrid Machine Translation System for WMT 2011 - On the Integration of SMT and RBMT
Jia Xu | Hans Uszkoreit | Casey Kennington | David Vilar | Xiaojun Zhang
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Lightly-Supervised Training for Hierarchical Phrase-Based Machine Translation
Matthias Huck | David Vilar | Daniel Stein | Hermann Ney
Proceedings of the First workshop on Unsupervised Learning in NLP

pdf bib
Advancements in Arabic-to-English Hierarchical Machine Translation
Matthias Huck | David Vilar | Daniel Stein | Hermann Ney
Proceedings of the 15th Annual conference of the European Association for Machine Translation

2010

pdf bib
Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models
David Vilar | Daniel Stein | Matthias Huck | Hermann Ney
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

2009

pdf bib
The RWTH Machine Translation System for WMT 2009
Maja Popović | David Vilar | Daniel Stein | Evgeny Matusov | Hermann Ney
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
On LM Heuristics for the Cube Growing Algorithm
David Vilar | Hermann Ney
Proceedings of the 13th Annual conference of the European Association for Machine Translation

2007

pdf bib
Analysis and System Combination of Phrase- and N-Gram-Based Statistical Machine Translation Systems
Marta R. Costa-jussà | Josep M. Crego | David Vilar | José A. R. Fonollosa | José B. Mariño | Hermann Ney
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

pdf bib
Can We Translate Letters?
David Vilar | Jan-Thorsten Peter | Hermann Ney
Proceedings of the Second Workshop on Statistical Machine Translation

pdf bib
Human Evaluation of Machine Translation Through Binary System Comparisons
David Vilar | Gregor Leusch | Hermann Ney | Rafael E. Banchs
Proceedings of the Second Workshop on Statistical Machine Translation

2006

pdf bib
Error Analysis of Statistical Machine Translation Output
David Vilar | Jia Xu | Luis Fernando D’Haro | Hermann Ney
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Evaluation of automatic translation output is a difficult task. Several performance measures like Word Error Rate, Position Independent Word Error Rate and the BLEU and NIST scores are widely use and provide a useful tool for comparing different systems and to evaluate improvements within a system. However the interpretation of all of these measures is not at all clear, and the identification of the most prominent source of errors in a given system using these measures alone is not possible. Therefore some analysis of the generated translations is needed in order to identify the main problems and to focus the research efforts. This area is however mostly unexplored and few works have dealt with it until now. In this paper we will present a framework for classification of the errors of a machine translation system and we will carry out an error analysis of the system used by the RWTH in the first TC-STAR evaluation.

2005

pdf bib
Comparison of generation strategies for interactive machine translation
Oliver Bender | Saša Hasan | David Vilar | Richard Zens | Hermann Ney
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

pdf bib
Augmenting a Small Parallel Text with Morpho-Syntactic Language
Maja Popović | David Vilar | Hermann Ney | Slobodan Jovičić | Zoran Šarić
Proceedings of the ACL Workshop on Building and Using Parallel Texts

pdf bib
Novel Reordering Approaches in Phrase-Based Statistical Machine Translation
Stephan Kanthak | David Vilar | Evgeny Matusov | Richard Zens | Hermann Ney
Proceedings of the ACL Workshop on Building and Using Parallel Texts

pdf bib
Preprocessing and Normalization for Automatic Evaluation of Machine Translation
Gregor Leusch | Nicola Ueffing | David Vilar | Hermann Ney
Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization