Annette Rios Gonzales

Also published as: Annette Rios


pdf bib
Domain Robustness in Neural Machine Translation
Mathias Müller | Annette Rios | Rico Sennrich
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)


pdf bib
Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures
Gongbo Tang | Mathias Müller | Annette Rios | Rico Sennrich
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, and it has been speculated that this improves their ability to model long-range dependencies. However, this theoretical argument has not been tested empirically, nor have alternative explanations for their strong performance been explored in-depth. We hypothesize that the strong performance of CNNs and self-attentional networks could also be due to their ability to extract semantic features from the source text, and we evaluate RNNs, CNNs and self-attention networks on two tasks: subject-verb agreement (where capturing long-range dependencies is required) and word sense disambiguation (where semantic feature extraction is required). Our experimental results show that: 1) self-attentional networks and CNNs do not outperform RNNs in modeling subject-verb agreement over long distances; 2) self-attentional networks perform distinctly better than RNNs and CNNs on word sense disambiguation.

pdf bib
A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation
Mathias Müller | Annette Rios | Elena Voita | Rico Sennrich
Proceedings of the Third Conference on Machine Translation: Research Papers

The translation of pronouns presents a special challenge to machine translation to this day, since it often requires context outside the current sentence. Recent work on models that have access to information across sentence boundaries has seen only moderate improvements in terms of automatic evaluation metrics such as BLEU. However, metrics that quantify the overall translation quality are ill-equipped to measure gains from additional context. We argue that a different kind of evaluation is needed to assess how well models translate inter-sentential phenomena such as pronouns. This paper therefore presents a test suite of contrastive translations focused specifically on the translation of pronouns. Furthermore, we perform experiments with several context-aware models. We show that, while gains in BLEU are moderate for those systems, they outperform baselines by a large margin in terms of accuracy on our contrastive test set. Our experiments also show the effectiveness of parameter tying for multi-encoder architectures.

pdf bib
The Word Sense Disambiguation Test Suite at WMT18
Annette Rios | Mathias Müller | Rico Sennrich
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We present a task to measure an MT system’s capability to translate ambiguous words with their correct sense according to the given context. The task is based on the German–English Word Sense Disambiguation (WSD) test set ContraWSD (Rios Gonzales et al., 2017), but it has been filtered to reduce noise, and the evaluation has been adapted to assess MT output directly rather than scoring existing translations. We evaluate all German–English submissions to the WMT’18 shared translation task, plus a number of submissions from previous years, and find that performance on the task has markedly improved compared to the 2016 WMT submissions (81%→93% accuracy on the WSD task). We also find that the unsupervised submissions to the task have a low WSD capability, and predominantly translate ambiguous source words with the same sense.


pdf bib
Machine Translation of Spanish Personal and Possessive Pronouns Using Anaphora Probabilities
Ngoc Quang Luong | Andrei Popescu-Belis | Annette Rios Gonzales | Don Tuggener
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We implement a fully probabilistic model to combine the hypotheses of a Spanish anaphora resolution system with those of a Spanish-English machine translation system. The probabilities over antecedents are converted into probabilities for the features of translated pronouns, and are integrated with phrase-based MT using an additional translation model for pronouns. The system improves the translation of several Spanish personal and possessive pronouns into English, by solving translation divergencies such as ‘ella’ vs. ‘she’/‘it’ or ‘su’ vs. ‘his’/‘her’/‘its’/‘their’. On a test set with 2,286 pronouns, a baseline system correctly translates 1,055 of them, while ours improves this by 41. Moreover, with oracle antecedents, possessives are translated with an accuracy of 83%.

pdf bib
Co-reference Resolution of Elided Subjects and Possessive Pronouns in Spanish-English Statistical Machine Translation
Annette Rios Gonzales | Don Tuggener
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

This paper presents a straightforward method to integrate co-reference information into phrase-based machine translation to address the problems of i) elided subjects and ii) morphological underspecification of pronouns when translating from pro-drop languages. We evaluate the method for the language pair Spanish-English and find that translation quality improves with the addition of co-reference information.

pdf bib
Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings
Annette Rios Gonzales | Laura Mascarell | Rico Sennrich
Proceedings of the Second Conference on Machine Translation


pdf bib
Morphological Disambiguation and Text Normalization for Southern Quechua Varieties
Annette Rios Gonzales | Richard Alexander Castro Mamani
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects


pdf bib
Machine Learning Disambiguation of Quechua Verb Morphology
Annette Rios Gonzales | Anne Göhring
Proceedings of the Second Workshop on Hybrid Approaches to Translation


pdf bib
A tree is a Baum is an árbol is a sach’a: Creating a trilingual treebank
Annette Rios | Anne Göhring
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes the process of constructing a trilingual parallel treebank. While for two of the involved languages, Spanish and German, there are already corpora with well-established annotation schemes available, this is not the case with the third language: Cuzco Quechua (ISO 639-3:quz), a low-resourced, non-standardized language for which we had to define a linguistically plausible annotation scheme first.