Nora Aranberri


2020

pdf bib
With or without you? Effects of using machine translation to write flash fiction in the foreign language
Nora Aranberri
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

The improvement in the quality of machine translation (MT) for both majority and minority languages in recent years is resulting in its steady adoption. This is not only happening among professional translators but also among users who occasionally find themselves in situations where translation is required or MT presents itself as a easier means to producing a text. This work sets to explore the effect using MT has in flash fiction produced in the foreign language. Specifically, we study the impact in surface closeness, syntactic and lexical complexity, and edits. Results show that texts produced with MT seem to fit closer to certain traits of the foreign language and that differences in the use of part-of-speech categories and structures emerge. Moreover, the analysis of the post-edited texts reveals that participants approach the editing of the MT output differently, displaying a wide range in the number of edits.

2019

pdf bib
Comparison of temporal, technical and cognitive dimension measurements for post-editing effort
Cristina Cumbreno | Nora Aranberri
Proceedings of the Second MEMENTO workshop on Modelling Parameters of Cognitive Effort in Translation Production

2018

pdf bib
Building Named Entity Recognition Taggers via Parallel Corpora
Rodrigo Agerri | Yiling Chung | Itziar Aldabe | Nora Aranberri | Gorka Labaka | German Rigau
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Tools and Guidelines for Principled Machine Translation Development
Nora Aranberri | Eleftherios Avramidis | Aljoscha Burchardt | Ondřej Klejch | Martin Popel | Maja Popović
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This work addresses the need to aid Machine Translation (MT) development cycles with a complete workflow of MT evaluation methods. Our aim is to assess, compare and improve MT system variants. We hereby report on novel tools and practices that support various measures, developed in order to support a principled and informed approach of MT development. Our toolkit for automatic evaluation showcases quick and detailed comparison of MT system variants through automatic metrics and n-gram feedback, along with manual evaluation via edit-distance, error annotation and task-based feedback.

pdf bib
TweetMT: A Parallel Microblog Corpus
Iñaki San Vicente | Iñaki Alegría | Cristina España-Bonet | Pablo Gamallo | Hugo Gonçalo Oliveira | Eva Martínez Garcia | Antonio Toral | Arkaitz Zubiaga | Nora Aranberri
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We introduce TweetMT, a parallel corpus of tweets in four language pairs that combine five languages (Spanish from/to Basque, Catalan, Galician and Portuguese), all of which have an official status in the Iberian Peninsula. The corpus has been created by combining automatic collection and crowdsourcing approaches, and it is publicly available. It is intended for the development and testing of microtext machine translation systems. In this paper we describe the methodology followed to build the corpus, and present the results of the shared task in which it was tested.

pdf bib
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
Arantxa Otegi | Nora Aranberri | Antonio Branco | Jan Hajič | Martin Popel | Kiril Simov | Eneko Agirre | Petya Osenova | Rita Pereira | João Silva | Steven Neale
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part-of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is common in all parallel corpora, with translations in five languages, namely, Basque, Bulgarian, Czech, Portuguese and Spanish. We describe the annotated corpora and the tools used for annotation, as well as annotation statistics for each language. These new resources are freely available and will help research on semantic processing for machine translation and cross-lingual transfer.

pdf bib
Proceedings of the 2nd Workshop on Semantics-Driven Machine Translation (SedMT 2016)
Deyi Xiong | Kevin Duh | Eneko Agirre | Nora Aranberri | Houfeng Wang
Proceedings of the 2nd Workshop on Semantics-Driven Machine Translation (SedMT 2016)

2015

pdf bib
Exploiting portability to build an RBMT prototype for a new source language
Nora Aranberri | Gorka Labaka | Arantza Díaz de Ilarraza | Kepa Sarasola
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation
Dekai Wu | Marine Carpuat | Eneko Agirre | Nora Aranberri
Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
SMT error analysis and mapping to syntactic, semantic and structural fixes
Nora Aranberri
Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Exploiting portability to build an RBMT prototype for a new source language
Nora Aranberri | Gorka Labaka | Arantza Díaz de Ilarraza | Kepa Sarasola
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Deep-syntax TectoMT for English-Spanish MT
Gorka Labaka | Oneka Jauregi | Arantza Díaz de Ilarraza | Michael Ustaszewski | Nora Aranberri | Eneko Agirre
Proceedings of the 1st Deep Machine Translation Workshop

2014

pdf bib
TweetNorm_es: an annotated corpus for Spanish microtext normalization
Iñaki Alegria | Nora Aranberri | Pere Comas | Víctor Fresno | Pablo Gamallo | Lluis Padró | Iñaki San Vicente | Jordi Turmo | Arkaitz Zubiaga
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we introduce TweetNorm_es, an annotated corpus of tweets in Spanish language, which we make publicly available under the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of a joint annotation effort from different research groups. In this paper we describe the methodology defined to build the corpus as well as the guidelines followed in the annotation process. We also present a brief overview of the Tweet-Norm shared task, as the first evaluation environment where the corpus was used.