M. Amin Farajian

Also published as: Mohammad Amin Farajian


2020

pdf bib
Document-level Neural MT: A Systematic Comparison
António Lopes | M. Amin Farajian | Rachel Bawden | Michael Zhang | André F. T. Martins
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

In this paper we provide a systematic comparison of existing and new document-level neural machine translation solutions. As part of this comparison, we introduce and evaluate a document-level variant of the recently proposed Star Transformer architecture. In addition to using the traditional metric BLEU, we report the accuracy of the models in handling anaphoric pronoun translation as well as coherence and cohesion using contrastive test sets. Finally, we report the results of human evaluation in terms of Multidimensional Quality Metrics (MQM) and analyse the correlation of the results obtained by the automatic metrics with human judgments.

2019

pdf bib
Unbabel’s Participation in the WMT19 Translation Quality Estimation Shared Task
Fabio Kepler | Jonay Trénous | Marcos Treviso | Miguel Vera | António Góis | M. Amin Farajian | António V. Lopes | André F. T. Martins
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

We present the contribution of the Unbabel team to the WMT 2019 Shared Task on Quality Estimation. We participated on the word, sentence, and document-level tracks, encompassing 3 language pairs: English-German, English-Russian, and English-French. Our submissions build upon the recent OpenKiwi framework: We combine linear, neural, and predictor-estimator systems with new transfer learning approaches using BERT and XLM pre-trained models. We compare systems individually and propose new ensemble techniques for word and sentence-level predictions. We also propose a simple technique for converting word labels into document-level predictions. Overall, our submitted systems achieve the best results on all tracks and language pairs by a considerable margin.

pdf bib
Unbabel’s Submission to the WMT2019 APE Shared Task: BERT-Based Encoder-Decoder for Automatic Post-Editing
António V. Lopes | M. Amin Farajian | Gonçalo M. Correia | Jonay Trénous | André F. T. Martins
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

This paper describes Unbabel’s submission to the WMT2019 APE Shared Task for the English-German language pair. Following the recent rise of large, powerful, pre-trained models, we adapt the BERT pretrained model to perform Automatic Post-Editing in an encoder-decoder framework. Analogously to dual-encoder architectures we develop a BERT-based encoder-decoder (BED) model in which a single pretrained BERT encoder receives both the source src and machine translation mt strings. Furthermore, we explore a conservativeness factor to constrain the APE system to perform fewer edits. As the official results show, when trained on a weighted combination of in-domain and artificial training data, our BED system with the conservativeness penalty improves significantly the translations of a strong NMT system by -0.78 and +1.23 in terms of TER and BLEU, respectively. Finally, our submission achieves a new state-of-the-art, ex-aequo, in English-German APE of NMT.

2017

pdf bib
Neural vs. Phrase-Based Machine Translation in a Multi-Domain Scenario
M. Amin Farajian | Marco Turchi | Matteo Negri | Nicola Bertoldi | Marcello Federico
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

State-of-the-art neural machine translation (NMT) systems are generally trained on specific domains by carefully selecting the training sets and applying proper domain adaptation techniques. In this paper we consider the real world scenario in which the target domain is not predefined, hence the system should be able to translate text from multiple domains. We compare the performance of a generic NMT system and phrase-based statistical machine translation (PBMT) system by training them on a generic parallel corpus composed of data from different domains. Our results on multi-domain English-French data show that, in these realistic conditions, PBMT outperforms its neural counterpart. This raises the question: is NMT ready for deployment as a generic/multi-purpose MT backbone in real-world settings?

pdf bib
Multi-Domain Neural Machine Translation through Unsupervised Adaptation
M. Amin Farajian | Marco Turchi | Matteo Negri | Marcello Federico
Proceedings of the Second Conference on Machine Translation

pdf bib
Multi-source Neural Automatic Post-Editing: FBK’s participation in the WMT 2017 APE shared task
Rajen Chatterjee | M. Amin Farajian | Matteo Negri | Marco Turchi | Ankit Srivastava | Santanu Pal
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
WAGS: A Beautiful English-Italian Benchmark Supporting Word Alignment Evaluation on Rare Words
Luisa Bentivogli | Mauro Cettolo | M. Amin Farajian | Marcello Federico
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents WAGS (Word Alignment Gold Standard), a novel benchmark which allows extensive evaluation of WA tools on out-of-vocabulary (OOV) and rare words. WAGS is a subset of the Common Test section of the Europarl English-Italian parallel corpus, and is specifically tailored to OOV and rare words. WAGS is composed of 6,715 sentence pairs containing 11,958 occurrences of OOV and rare words up to frequency 15 in the Europarl Training set (5,080 English words and 6,878 Italian words), representing almost 3% of the whole text. Since WAGS is focused on OOV/rare words, manual alignments are provided for these words only, and not for the whole sentences. Two off-the-shelf word aligners have been evaluated on WAGS, and results have been compared to those obtained on an existing benchmark tailored to full text alignment. The results obtained confirm that WAGS is a valuable resource, which allows a statistically sound evaluation of WA systems’ performance on OOV and rare words, as well as extensive data analyses. WAGS is publicly released under a Creative Commons Attribution license.

2014

pdf bib
Online Word Alignment for Online Adaptive Machine Translation
M. Amin Farajian | Nicola Bertoldi | Marcello Federico
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation

pdf bib
MT-EQuAl: a Toolkit for Human Assessment of Machine Translation Output
Christian Girardi | Luisa Bentivogli | Mohammad Amin Farajian | Marcello Federico
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations