Evgeny Matusov


2020

pdf bib
Flexible Customization of a Single Neural Machine Translation System with Multi-dimensional Metadata Inputs
Evgeny Matusov | Patrick Wilken | Christian Herold
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)

pdf bib
Start-Before-End and End-to-End: Neural Speech Translation by AppTek and RWTH Aachen University
Parnia Bahar | Patrick Wilken | Tamer Alkhouli | Andreas Guta | Pavel Golik | Evgeny Matusov | Christian Herold
Proceedings of the 17th International Conference on Spoken Language Translation

AppTek and RWTH Aachen University team together to participate in the offline and simultaneous speech translation tracks of IWSLT 2020. For the offline task, we create both cascaded and end-to-end speech translation systems, paying attention to careful data selection and weighting. In the cascaded approach, we combine high-quality hybrid automatic speech recognition (ASR) with the Transformer-based neural machine translation (NMT). Our end-to-end direct speech translation systems benefit from pretraining of adapted encoder and decoder components, as well as synthetic data and fine-tuning and thus are able to compete with cascaded systems in terms of MT quality. For simultaneous translation, we utilize a novel architecture that makes dynamic decisions, learned from parallel data, to determine when to continue feeding on input or generate output words. Experiments with speech and text input show that even at low latency this architecture leads to superior translation results.

pdf bib
Neural Simultaneous Speech Translation Using Alignment-Based Chunking
Patrick Wilken | Tamer Alkhouli | Evgeny Matusov | Pavel Golik
Proceedings of the 17th International Conference on Spoken Language Translation

In simultaneous machine translation, the objective is to determine when to produce a partial translation given a continuous stream of source words, with a trade-off between latency and quality. We propose a neural machine translation (NMT) model that makes dynamic decisions when to continue feeding on input or generate output words. The model is composed of two main components: one to dynamically decide on ending a source chunk, and another that translates the consumed chunk. We train the components jointly and in a manner consistent with the inference conditions. To generate chunked training data, we propose a method that utilizes word alignment while also preserving enough context. We compare models with bidirectional and unidirectional encoders of different depths, both on real speech and text input. Our results on the IWSLT 2020 English-to-German task outperform a wait-k baseline by 2.6 to 3.7% BLEU absolute.

2019

pdf bib
Customizing Neural Machine Translation for Subtitling
Evgeny Matusov | Patrick Wilken | Yota Georgakopoulou
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

In this work, we customized a neural machine translation system for translation of subtitles in the domain of entertainment. The neural translation model was adapted to the subtitling content and style and extended by a simple, yet effective technique for utilizing inter-sentence context for short sentences such as dialog turns. The main contribution of the paper is a novel subtitle segmentation algorithm that predicts the end of a subtitle line given the previous word-level context using a recurrent neural network learned from human segmentation decisions. This model is combined with subtitle length and duration constraints established in the subtitling industry. We conducted a thorough human evaluation with two post-editors (English-to-Spanish translation of a documentary and a sitcom). It showed a notable productivity increase of up to 37% as compared to translating from scratch and significant reductions in human translation edit rate in comparison with the post-editing of the baseline non-adapted system without a learned segmentation model.

pdf bib
The Challenges of Using Neural Machine Translation for Literature
Evgeny Matusov
Proceedings of the Qualities of Literary Machine Translation

2018

pdf bib
Can Neural Machine Translation be Improved with User Feedback?
Julia Kreutzer | Shahram Khadivi | Evgeny Matusov | Stefan Riezler
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

We present the first real-world application of methods for improving neural machine translation (NMT) with human reinforcement, based on explicit and implicit user feedback collected on the eBay e-commerce platform. Previous work has been confined to simulation experiments, whereas in this paper we work with real logged feedback for offline bandit learning of NMT parameters. We conduct a thorough analysis of the available explicit user judgments—five-star ratings of translation quality—and show that they are not reliable enough to yield significant improvements in bandit learning. In contrast, we successfully utilize implicit task-based feedback collected in a cross-lingual search task to improve task-specific and machine translation quality metrics.

pdf bib
Learning from Chunk-based Feedback in Neural Machine Translation
Pavel Petrushkov | Shahram Khadivi | Evgeny Matusov
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We empirically investigate learning from partial feedback in neural machine translation (NMT), when partial feedback is collected by asking users to highlight a correct chunk of a translation. We propose a simple and effective way of utilizing such feedback in NMT training. We demonstrate how the common machine translation problem of domain mismatch between training and deployment can be reduced solely based on chunk-level user feedback. We conduct a series of simulation experiments to test the effectiveness of the proposed method. Our results show that chunk-level feedback outperforms sentence based feedback by up to 2.61% BLEU absolute.

pdf bib
Generating E-Commerce Product Titles and Predicting their Quality
José G. Camargo de Souza | Michael Kozielski | Prashant Mathur | Ernie Chang | Marco Guerini | Matteo Negri | Marco Turchi | Evgeny Matusov
Proceedings of the 11th International Conference on Natural Language Generation

E-commerce platforms present products using titles that summarize product information. These titles cannot be created by hand, therefore an algorithmic solution is required. The task of automatically generating these titles given noisy user provided titles is one way to achieve the goal. The setting requires the generation process to be fast and the generated title to be both human-readable and concise. Furthermore, we need to understand if such generated titles are usable. As such, we propose approaches that (i) automatically generate product titles, (ii) predict their quality. Our approach scales to millions of products and both automatic and human evaluations performed on real-world data indicate our approaches are effective and applicable to existing e-commerce scenarios.

2017

pdf bib
Using Images to Improve Machine-Translating E-Commerce Product Listings.
Iacer Calixto | Daniel Stein | Evgeny Matusov | Pintu Lohar | Sheila Castilho | Andy Way
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

In this paper we study the impact of using images to machine-translate user-generated e-commerce product listings. We study how a multi-modal Neural Machine Translation (NMT) model compares to two text-only approaches: a conventional state-of-the-art attentional NMT and a Statistical Machine Translation (SMT) model. User-generated product listings often do not constitute grammatical or well-formed sentences. More often than not, they consist of the juxtaposition of short phrases or keywords. We train our models end-to-end as well as use text-only and multi-modal NMT models for re-ranking n-best lists generated by an SMT model. We qualitatively evaluate our user-generated training data also analyse how adding synthetic data impacts the results. We evaluate our models quantitatively using BLEU and TER and find that (i) additional synthetic data has a general positive impact on text-only and multi-modal NMT models, and that (ii) using a multi-modal NMT model for re-ranking n-best lists improves TER significantly across different n-best list sizes.

pdf bib
Human Evaluation of Multi-modal Neural Machine Translation: A Case-Study on E-Commerce Listing Titles
Iacer Calixto | Daniel Stein | Evgeny Matusov | Sheila Castilho | Andy Way
Proceedings of the Sixth Workshop on Vision and Language

In this paper, we study how humans perceive the use of images as an additional knowledge source to machine-translate user-generated product listings in an e-commerce company. We conduct a human evaluation where we assess how a multi-modal neural machine translation (NMT) model compares to two text-only approaches: a conventional state-of-the-art attention-based NMT and a phrase-based statistical machine translation (PBSMT) model. We evaluate translations obtained with different systems and also discuss the data set of user-generated product listings, which in our case comprises both product listings and associated images. We found that humans preferred translations obtained with a PBSMT system to both text-only and multi-modal NMT over 56% of the time. Nonetheless, human evaluators ranked translations from a multi-modal NMT model as better than those of a text-only NMT over 88% of the time, which suggests that images do help NMT in this use-case.

pdf bib
Neural Machine Translation Leveraging Phrase-based Models in a Hybrid Search
Leonard Dahlmann | Evgeny Matusov | Pavel Petrushkov | Shahram Khadivi
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper, we introduce a hybrid search for attention-based neural machine translation (NMT). A target phrase learned with statistical MT models extends a hypothesis in the NMT beam search when the attention of the NMT model focuses on the source words translated by this phrase. Phrases added in this way are scored with the NMT model, but also with SMT features including phrase-level translation probabilities and a target language model. Experimental results on German-to-English news domain and English-to-Russian e-commerce domain translation tasks show that using phrase-based models in NMT search improves MT quality by up to 2.3% BLEU absolute as compared to a strong NMT baseline.

2013

pdf bib
Selective Combination of Pivot and Direct Statistical Machine Translation Models
Ahmed El Kholy | Nizar Habash | Gregor Leusch | Evgeny Matusov | Hassan Sawaf
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Omnifluent English-to-French and Russian-to-English Systems for the 2013 Workshop on Statistical Machine Translation
Evgeny Matusov | Gregor Leusch
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation
Ahmed El Kholy | Nizar Habash | Gregor Leusch | Evgeny Matusov | Hassan Sawaf
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2009

pdf bib
The RWTH System Combination System for WMT 2009
Gregor Leusch | Evgeny Matusov | Hermann Ney
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
The RWTH Machine Translation System for WMT 2009
Maja Popović | David Vilar | Daniel Stein | Evgeny Matusov | Hermann Ney
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Are Unaligned Words Important for Machine Translation?
Yuqi Zhang | Evgeny Matusov | Hermann Ney
Proceedings of the 13th Annual conference of the European Association for Machine Translation

2008

pdf bib
Tighter Integration of Rule-Based and Statistical MT in Serial System Combination
Nicola Ueffing | Jens Stephan | Evgeny Matusov | Loïc Dugast | George Foster | Roland Kuhn | Jean Senellart | Jin Yang
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Complexity of Finding the BLEU-optimal Hypothesis in a Confusion Network
Gregor Leusch | Evgeny Matusov | Hermann Ney
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2006

pdf bib
Training a Statistical Machine Translation System without GIZA++
Arne Mauser | Evgeny Matusov | Hermann Ney
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The IBM Models (Brown et al., 1993) enjoy great popularity in the machine translation community because they offer high quality word alignments and a free implementation is available with the GIZA++ Toolkit (Och and Ney, 2003). Several methods have been developed to overcome the asymmetry of the alignment generated by the IBM Models. A remaining disadvantage, however, is the high model complexity. This paper describes a word alignment training procedure for statistical machine translation that uses a simple and clear statistical model, different from the IBM models. The main idea of the algorithm is to generate a symmetric and monotonic alignment between the target sentence and a permutation graph representing different reorderings of the words in the source sentence. The quality of the generated alignment is shown to be comparable to the standard GIZA++ training in an SMT setup.

pdf bib
Computing Consensus Translation for Multiple Machine Translation Systems Using Enhanced Hypothesis Alignment
Evgeny Matusov | Nicola Ueffing | Hermann Ney
11th Conference of the European Chapter of the Association for Computational Linguistics

2005

pdf bib
Efficient statistical machine translation with constrained reordering
Evgeny Matusov | Stephan Kanthak | Hermann Ney
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

pdf bib
Novel Reordering Approaches in Phrase-Based Statistical Machine Translation
Stephan Kanthak | David Vilar | Evgeny Matusov | Richard Zens | Hermann Ney
Proceedings of the ACL Workshop on Building and Using Parallel Texts

2004

pdf bib
Improved Word Alignment Using a Symmetric Lexicon Model
Richard Zens | Evgeny Matusov | Hermann Ney
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Symmetric Word Alignments for Statistical Machine Translation
Evgeny Matusov | Richard Zens | Hermann Ney
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics