Juan Antonio Pérez-Ortiz


2020

pdf bib
Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation
Víctor M. Sánchez-Cartagena | Juan Antonio Pérez-Ortiz | Felipe Sánchez-Martínez
Proceedings of the 28th International Conference on Computational Linguistics

This paper studies the effects of word-level linguistic annotations in under-resourced neural machine translation, for which there is incomplete evidence in the literature. The study covers eight language pairs, different training corpus sizes, two architectures, and three types of annotation: dummy tags (with no linguistic information at all), part-of-speech tags, and morpho-syntactic description tags, which consist of part of speech and morphological features. These linguistic annotations are interleaved in the input or output streams as a single tag placed before each word. In order to measure the performance under each scenario, we use automatic evaluation metrics and perform automatic error classification. Our experiments show that, in general, source-language annotations are helpful and morpho-syntactic descriptions outperform part of speech for some language pairs. On the contrary, when words are annotated in the target language, part-of-speech tags systematically outperform morpho-syntactic description tags in terms of automatic evaluation metrics, even though the use of morpho-syntactic description tags improves the grammaticality of the output. We provide a detailed analysis of the reasons behind this result.

pdf bib
An English-Swahili parallel corpus and its use for neural machine translation in the news domain
Felipe Sánchez-Martínez | Víctor M. Sánchez-Cartagena | Juan Antonio Pérez-Ortiz | Mikel L. Forcada | Miquel Esplà-Gomis | Andrew Secker | Susie Coleman | Julie Wall
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

This paper describes our approach to create a neural machine translation system to translate between English and Swahili (both directions) in the news domain, as well as the process we followed to crawl the necessary parallel corpora from the Internet. We report the results of a pilot human evaluation performed by the news media organisations participating in the H2020 EU-funded project GoURMET.

2019

pdf bib
The Universitat d’Alacant Submissions to the English-to-Kazakh News Translation Task at WMT 2019
Víctor M. Sánchez-Cartagena | Juan Antonio Pérez-Ortiz | Felipe Sánchez-Martínez
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes the two submissions of Universitat d’Alacant to the English-to-Kazakh news translation task at WMT 2019. Our submissions take advantage of monolingual data and parallel data from other language pairs by means of iterative backtranslation, pivot backtranslation and transfer learning. They also use linguistic information in two ways: morphological segmentation of Kazakh text, and integration of the output of a rule-based machine translation system. Our systems were ranked second in terms of chrF++ despite being built from an ensemble of only 2 independent training runs.

pdf bib
Global Under-Resourced Media Translation (GoURMET)
Alexandra Birch | Barry Haddow | Ivan Tito | Antonio Valerio Miceli Barone | Rachel Bawden | Felipe Sánchez-Martínez | Mikel L. Forcada | Miquel Esplà-Gomis | Víctor Sánchez-Cartagena | Juan Antonio Pérez-Ortiz | Wilker Aziz | Andrew Secker | Peggy van der Kreeft
Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks

2016

pdf bib
Stand-off Annotation of Web Content as a Legally Safer Alternative to Crawling for Distribution
Mikel L. Forcada | Miquel Esplà-Gomis | Juan Antonio Pérez-Ortiz
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

2015

pdf bib
Evaluating machine translation for assimilation via a gap-filling task
Ekaterina Ageeva | Francis M. Tyers | Mikel L. Forcada | Juan Antonio Pérez-Ortiz
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Evaluating machine translation for assimilation via a gap-filling task
Ekaterina Ageeva | Mikel L. Forcada | Francis M. Tyers | Juan Antonio Pérez-Ortiz
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf bib
Black-box integration of heterogeneous bilingual resources into an interactive translation system
Juan Antonio Pérez-Ortiz | Daniel Torregrosa | Mikel Forcada
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation

pdf bib
The UA-Prompsit hybrid machine translation system for the 2014 Workshop on Statistical Machine Translation
Víctor M. Sánchez-Cartagena | Juan Antonio Pérez-Ortiz | Felipe Sánchez-Martínez
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
An efficient method to assist non-expert users in extending dictionaries by assigning stems and inflectional paradigms to unknknown words
Miquel Esplà-Gomis | Víctor M. Sánchez-Cartegna | Felipe Sánchez-Martínez | Rafael C. Carrasco | Mikel L. Forcada | Juan Antonio Pérez-Ortiz
Proceedings of the 17th Annual conference of the European Association for Machine Translation

2012

pdf bib
Source-Language Dictionaries Help Non-Expert Users to Enlarge Target-Language Dictionaries for Machine Translation
Víctor M. Sánchez-Cartagena | Miquel Esplà-Gomis | Juan Antonio Pérez-Ortiz
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper, a previous work on the enlargement of monolingual dictionaries of rule-based machine translation systems by non-expert users is extended to tackle the complete task of adding both source-language and target-language words to the monolingual dictionaries and the bilingual dictionary. In the original method, users validate whether some suffix variations of the word to be inserted are correct in order to find the most appropriate inflection paradigm. This method is now improved by taking advantage from the strong correlation detected between paradigms in both languages to reduce the search space of the target-language paradigm once the source-language paradigm is known. Results show that, when the source-language word has already been inserted, the system is able to more accurately predict which is the right target-language paradigm, and the number of queries posed to users is significantly reduced. Experiments also show that, when the source language and the target language are not closely related, it is only the source-language part-of-speech category, but not the rest of information provided by the source-language paradigm, which helps to correctly classify the target-language word.

2011

pdf bib
Enriching a statistical machine translation system trained on small parallel corpora with rule-based bilingual phrases
Víctor M. Sánchez-Cartagena | Felipe Sánchez-Martínez | Juan Antonio Pérez-Ortiz
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
Enlarging Monolingual Dictionaries for Machine Translation with Active Learning and Non-Expert Users
Miquel Esplà-Gomis | Víctor M. Sánchez-Cartagena | Juan Antonio Pérez-Ortiz
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
The Universitat d’Alacant hybrid machine translation system for WMT 2011
Víctor M. Sánchez-Cartagena | Felipe Sánchez-Martínez | Juan Antonio Pérez-Ortiz
Proceedings of the Sixth Workshop on Statistical Machine Translation

2005

pdf bib
An open-source shallow-transfer machine translation engine for the Romance languages of Spain
Antonio M. Corbi-Bellot | Mikel L. Forcada | Sergio Ortíz-Rojas | Juan Antonio Pérez-Ortiz | Gema Ramírez-Sánchez | Felipe Sánchez-Martínez | Iñaki Alegria | Aingeru Mayor | Kepa Sarasola
Proceedings of the 10th EAMT Conference: Practical applications of machine translation