Mauro Cettolo


2018

pdf bib
A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation
Surafel Melaku Lakew | Mauro Cettolo | Marcello Federico
Proceedings of the 27th International Conference on Computational Linguistics

Recently, neural machine translation (NMT) has been extended to multilinguality, that is to handle more than one translation direction with a single system. Multilingual NMT showed competitive performance against pure bilingual systems. Notably, in low-resource settings, it proved to work effectively and efficiently, thanks to shared representation space that is forced across languages and induces a sort of transfer-learning. Furthermore, multilingual NMT enables so-called zero-shot inference across language pairs never seen at training time. Despite the increasing interest in this framework, an in-depth analysis of what a multilingual NMT model is capable of and what it is not is still missing. Motivated by this, our work (i) provides a quantitative and comparative analysis of the translations produced by bilingual, multilingual and zero-shot systems; (ii) investigates the translation quality of two of the currently dominant neural architectures in MT, which are the Recurrent and the Transformer ones; and (iii) quantitatively explores how the closeness between languages influences the zero-shot translation. Our analysis leverages multiple professional post-edits of automatic translations by several different systems and focuses both on automatic standard metrics (BLEU and TER) and on widely used error categories, which are lexical, morphology, and word order errors.

2017

pdf bib
Findings of the 2017 DiscoMT Shared Task on Cross-lingual Pronoun Prediction
Sharid Loáiciga | Sara Stymne | Preslav Nakov | Christian Hardmeier | Jörg Tiedemann | Mauro Cettolo | Yannick Versley
Proceedings of the Third Workshop on Discourse in Machine Translation

We describe the design, the setup, and the evaluation results of the DiscoMT 2017 shared task on cross-lingual pronoun prediction. The task asked participants to predict a target-language pronoun given a source-language pronoun in the context of a sentence. We further provided a lemmatized target-language human-authored translation of the source sentence, and automatic word alignments between the source sentence words and the target-language lemmata. The aim of the task was to predict, for each target-language pronoun placeholder, the word that should replace it from a small, closed set of classes, using any type of information that can be extracted from the entire document. We offered four subtasks, each for a different language pair and translation direction: English-to-French, English-to-German, German-to-English, and Spanish-to-English. Five teams participated in the shared task, making submissions for all language pairs. The evaluation results show that most participating teams outperformed two strong n-gram-based language model-based baseline systems by a sizable margin.

2016

pdf bib
WAGS: A Beautiful English-Italian Benchmark Supporting Word Alignment Evaluation on Rare Words
Luisa Bentivogli | Mauro Cettolo | M. Amin Farajian | Marcello Federico
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents WAGS (Word Alignment Gold Standard), a novel benchmark which allows extensive evaluation of WA tools on out-of-vocabulary (OOV) and rare words. WAGS is a subset of the Common Test section of the Europarl English-Italian parallel corpus, and is specifically tailored to OOV and rare words. WAGS is composed of 6,715 sentence pairs containing 11,958 occurrences of OOV and rare words up to frequency 15 in the Europarl Training set (5,080 English words and 6,878 Italian words), representing almost 3% of the whole text. Since WAGS is focused on OOV/rare words, manual alignments are provided for these words only, and not for the whole sentences. Two off-the-shelf word aligners have been evaluated on WAGS, and results have been compared to those obtained on an existing benchmark tailored to full text alignment. The results obtained confirm that WAGS is a valuable resource, which allows a statistically sound evaluation of WA systems’ performance on OOV and rare words, as well as extensive data analyses. WAGS is publicly released under a Creative Commons Attribution license.

pdf bib
Neural versus Phrase-Based Machine Translation Quality: a Case Study
Luisa Bentivogli | Arianna Bisazza | Mauro Cettolo | Marcello Federico
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Findings of the 2016 WMT Shared Task on Cross-lingual Pronoun Prediction
Liane Guillou | Christian Hardmeier | Preslav Nakov | Sara Stymne | Jörg Tiedemann | Yannick Versley | Mauro Cettolo | Bonnie Webber | Andrei Popescu-Belis
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf bib
Pronoun-Focused MT and Cross-Lingual Pronoun Prediction: Findings of the 2015 DiscoMT Shared Task on Pronoun Translation
Christian Hardmeier | Preslav Nakov | Sara Stymne | Jörg Tiedemann | Yannick Versley | Mauro Cettolo
Proceedings of the Second Workshop on Discourse in Machine Translation

2014

pdf bib
The MateCat Tool
Marcello Federico | Nicola Bertoldi | Mauro Cettolo | Matteo Negri | Marco Turchi | Marco Trombetti | Alessandro Cattelan | Antonio Farina | Domenico Lupinetti | Andrea Martines | Alberto Massidda | Holger Schwenk | Loïc Barrault | Frederic Blain | Philipp Koehn | Christian Buck | Ulrich Germann
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations

pdf bib
Proceedings of the 17th Annual conference of the European Association for Machine Translation
Mauro Cettolo | Marcello Federico | Lucia Specia | Andy Way
Proceedings of the 17th Annual conference of the European Association for Machine Translation

2013

pdf bib
Online Learning Approaches in Computer Assisted Translation
Prashant Mathur | Mauro Cettolo | Marcello Federico
Proceedings of the Eighth Workshop on Statistical Machine Translation

2012

pdf bib
Proceedings of the 16th Annual conference of the European Association for Machine Translation
Mauro Cettolo | Marcello Federico | Lucia Specia | Andy Way
Proceedings of the 16th Annual conference of the European Association for Machine Translation

pdf bib
WIT3: Web Inventory of Transcribed and Translated Talks
Mauro Cettolo | Christian Girardi | Marcello Federico
Proceedings of the 16th Annual conference of the European Association for Machine Translation

pdf bib
The IWSLT 2011 Evaluation Campaign on Automatic Talk Translation
Marcello Federico | Sebastian Stüker | Luisa Bentivogli | Michael Paul | Mauro Cettolo | Teresa Herrmann | Jan Niehues | Giovanni Moretti
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We report here on the eighth evaluation campaign organized in 2011 by the IWSLT workshop series. That IWSLT 2011 evaluation focused on the automatic translation of public talks and included tracks for speech recognition, speech translation, text translation, and system combination. Unlike in previous years, all data supplied for the evaluation has been publicly released on the workshop website, and is at the disposal of researchers interested in working on our benchmarks and in comparing their results with those published at the workshop. This paper provides an overview of the IWSLT 2011 evaluation campaign, and describes the data supplied, the evaluation infrastructure made available to participants, and the subjective evaluation carried out.

pdf bib
Evaluating the Learning Curve of Domain Adaptive Statistical Machine Translation Systems
Nicola Bertoldi | Mauro Cettolo | Marcello Federico | Christian Buck
Proceedings of the Seventh Workshop on Statistical Machine Translation

2011

pdf bib
Bootstrapping Arabic-Italian SMT through Comparable Texts and Pivot Translation
Mauro Cettolo | Nicola Bertoldi | Marcello Federico
Proceedings of the 15th Annual conference of the European Association for Machine Translation

2010

pdf bib
Statistical Machine Translation of Texts with Misspelled Words
Nicola Bertoldi | Mauro Cettolo | Marcello Federico
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Online Language Model adaptation via N-gram Mixtures for Statistical Machine Translation
Germán Sanchis-Trilles | Mauro Cettolo
Proceedings of the 14th Annual conference of the European Association for Machine Translation

2007

pdf bib
Efficient Handling of N-gram Language Models for Statistical Machine Translation
Marcello Federico | Mauro Cettolo
Proceedings of the Second Workshop on Statistical Machine Translation

2006

pdf bib
Maximum Entropy Tagging with Binary and Real-Valued Features
Vanessa Sandrini | Marcello Federico | Mauro Cettolo
Proceedings of the Workshop on Learning Structured Information in Natural Language Applications

pdf bib
A Web-based Demonstrator of a Multi-lingual Phrase-based Translation System
Roldano Cattoni | Nicola Bertoldi | Mauro Cettolo | Boxing Chen | Marcello Federico
Demonstrations