Mikel L. Forcada

Also published as: Mikel Forcada


2020

pdf bib
ParaCrawl: Web-Scale Acquisition of Parallel Corpora
Marta Bañón | Pinzhen Chen | Barry Haddow | Kenneth Heafield | Hieu Hoang | Miquel Esplà-Gomis | Mikel L. Forcada | Amir Kamran | Faheem Kirefu | Philipp Koehn | Sergio Ortiz Rojas | Leopoldo Pla Sempere | Gema Ramírez-Sánchez | Elsa Sarrías | Marek Strelec | Brian Thompson | William Waites | Dion Wiggins | Jaume Zaragoza
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We report on methods to create the largest publicly available parallel corpora by crawling the web, using open source software. We empirically compare alternative methods and publish benchmark data sets for sentence alignment and sentence pair filtering. We also describe the parallel corpora released and evaluate their quality and their usefulness to create machine translation systems.

pdf bib
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
André Martins | Helena Moniz | Sara Fumega | Bruno Martins | Fernando Batista | Luisa Coheur | Carla Parra | Isabel Trancoso | Marco Turchi | Arianna Bisazza | Joss Moorkens | Ana Guerberof | Mary Nurminen | Lena Marg | Mikel L. Forcada
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

pdf bib
A multi-source approach for Breton–French hybrid machine translation
Víctor M. Sánchez-Cartagena | Mikel L. Forcada | Felipe Sánchez-Martínez
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

Corpus-based approaches to machine translation (MT) have difficulties when the amount of parallel corpora to use for training is scarce, especially if the languages involved in the translation are highly inflected. This problem can be addressed from different perspectives, including data augmentation, transfer learning, and the use of additional resources, such as those used in rule-based MT. This paper focuses on the hybridisation of rule-based MT and neural MT for the Breton–French under-resourced language pair in an attempt to study to what extent the rule-based MT resources help improve the translation quality of the neural MT system for this particular under-resourced language pair. We combine both translation approaches in a multi-source neural MT architecture and find out that, even though the rule-based system has a low performance according to automatic evaluation metrics, using it leads to improved translation quality.

pdf bib
An English-Swahili parallel corpus and its use for neural machine translation in the news domain
Felipe Sánchez-Martínez | Víctor M. Sánchez-Cartagena | Juan Antonio Pérez-Ortiz | Mikel L. Forcada | Miquel Esplà-Gomis | Andrew Secker | Susie Coleman | Julie Wall
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

This paper describes our approach to create a neural machine translation system to translate between English and Swahili (both directions) in the news domain, as well as the process we followed to crawl the necessary parallel corpora from the Internet. We report the results of a pilot human evaluation performed by the news media organisations participating in the H2020 EU-funded project GoURMET.

2019

pdf bib
Proceedings of Machine Translation Summit XVII Volume 1: Research Track
Mikel Forcada | Andy Way | Barry Haddow | Rico Sennrich
Proceedings of Machine Translation Summit XVII Volume 1: Research Track

pdf bib
Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks
Mikel Forcada | Andy Way | John Tinsley | Dimitar Shterionov | Celia Rico | Federico Gaspari
Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks

pdf bib
ParaCrawl: Web-scale parallel corpora for the languages of the EU
Miquel Esplà | Mikel Forcada | Gema Ramírez-Sánchez | Hieu Hoang
Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks

pdf bib
Global Under-Resourced Media Translation (GoURMET)
Alexandra Birch | Barry Haddow | Ivan Tito | Antonio Valerio Miceli Barone | Rachel Bawden | Felipe Sánchez-Martínez | Mikel L. Forcada | Miquel Esplà-Gomis | Víctor Sánchez-Cartagena | Juan Antonio Pérez-Ortiz | Wilker Aziz | Andrew Secker | Peggy van der Kreeft
Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks

2018

pdf bib
Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting
Mikel L. Forcada | Carolina Scarton | Lucia Specia | Barry Haddow | Alexandra Birch
Proceedings of the Third Conference on Machine Translation: Research Papers

A popular application of machine translation (MT) is gisting: MT is consumed as is to make sense of text in a foreign language. Evaluation of the usefulness of MT for gisting is surprisingly uncommon. The classical method uses reading comprehension questionnaires (RCQ), in which informants are asked to answer professionally-written questions in their language about a foreign text that has been machine-translated into their language. Recently, gap-filling (GF), a form of cloze testing, has been proposed as a cheaper alternative to RCQ. In GF, certain words are removed from reference translations and readers are asked to fill the gaps left using the machine-translated text as a hint. This paper reports, for the first time, a comparative evaluation, using both RCQ and GF, of translations from multiple MT systems for the same foreign texts, and a systematic study on the effect of variables such as gap density, gap-selection strategies, and document context in GF. The main findings of the study are: (a) both RCQ and GF clearly identify MT to be useful; (b) global RCQ and GF rankings for the MT systems are mostly in agreement; (c) GF scores vary very widely across informants, making comparisons among MT systems hard, and (d) unlike RCQ, which is framed around documents, GF evaluation can be framed at the sentence level. These findings support the use of GF as a cheaper alternative to RCQ.

pdf bib
Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering
Philipp Koehn | Huda Khayrallah | Kenneth Heafield | Mikel L. Forcada
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We posed the shared task of assigning sentence-level quality scores for a very noisy corpus of sentence pairs crawled from the web, with the goal of sub-selecting 1% and 10% of high-quality data to be used to train machine translation systems. Seventeen participants from companies, national research labs, and universities participated in this task.

pdf bib
UAlacant machine translation quality estimation at WMT 2018: a simple approach using phrase tables and feed-forward neural networks
Felipe Sánchez-Martínez | Miquel Esplà-Gomis | Mikel L. Forcada
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We describe the Universitat d’Alacant submissions to the word- and sentence-level machine translation (MT) quality estimation (QE) shared task at WMT 2018. Our approach to word-level MT QE builds on previous work to mark the words in the machine-translated sentence as OK or BAD, and is extended to determine if a word or sequence of words need to be inserted in the gap after each word. Our sentence-level submission simply uses the edit operations predicted by the word-level approach to approximate TER. The method presented ranked first in the sub-task of identifying insertions in gaps for three out of the six datasets, and second in the rest of them.

2016

pdf bib
Apertium: a free/open source platform for machine translation and basic language technology
Mikel L. Forcada | Francis M. Tyers
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products

pdf bib
Abu-MaTran: automatic building of machine translation
Antonio Toral | Sergio Ortiz Rojas | Mikel Forcada | Nikola Lubesic | Prokopis Prokopidis
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products

pdf bib
Bitextor’s participation in WMT’16: shared task on document alignment
Miquel Esplà-Gomis | Mikel Forcada | Sergio Ortiz-Rojas | Jorge Ferrández-Tordera
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
UAlacant word-level and phrase-level machine translation quality estimation systems at WMT 2016
Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Mikel Forcada
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Stand-off Annotation of Web Content as a Legally Safer Alternative to Crawling for Distribution
Mikel L. Forcada | Miquel Esplà-Gomis | Juan Antonio Pérez-Ortiz
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

2015

pdf bib
Using on-line available sources of bilingual information for word-level machine translation quality estimation
Miquel Esplá-Gomis | Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
A general framework for minimizing translation effort: towards a principled combination of translation technologies in computer-aided translation
Mikel L. Forcada | Felipe Sánchez-Martínez
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Evaluating machine translation for assimilation via a gap-filling task
Ekaterina Ageeva | Francis M. Tyers | Mikel L. Forcada | Juan Antonio Pérez-Ortiz
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Unsupervised training of maximum-entropy models for lexical selection i in rule-based machine translation
Francis M. Tyers | Felipe Sánchez-Martinez | Mikel L. Forcada
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Abu-MaTran: Automatic building of Machine Translation
Antonio Toral | Tommi A Pirinen | Andy Way | Gema Ramírez-Sánchez | Sergio Ortiz Rojas | Raphael Rubino | Miquel Esplà | Mikel Forcada | Vassilis Papavassiliou | Prokopis Prokopidis | Nikola Ljubešić
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
UAlacant word-level machine translation quality estimation system at WMT 2015
Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Mikel Forcada
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Using on-line available sources of bilingual information for word-level machine translation quality estimation
Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
A general framework for minimizing translation effort: towards a principled combination of translation technologies in computer-aided translation
Mikel L. Forcada | Felipe Sánchez-Martínez
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Evaluating machine translation for assimilation via a gap-filling task
Ekaterina Ageeva | Mikel L. Forcada | Francis M. Tyers | Juan Antonio Pérez-Ortiz
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Unsupervised training of maximum-entropy models for lexical selection in rule-based machine translation
Francis M. Tyers | Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Abu-MaTran: Automatic building of Machine Translation
Antonio Toral | Tommi A. Pirinen | Andy Way | Gema Ramírez-Sánchez | Sergio Ortiz Rojas | Raphael Rubino | Miquel Esplà | Mikel L. Forcada | Vassilis Papavassiliou | Prokopis Prokopidis | Nikola Ljubešić
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf bib
Black-box integration of heterogeneous bilingual resources into an interactive translation system
Juan Antonio Pérez-Ortiz | Daniel Torregrosa | Mikel Forcada
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation

pdf bib
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing
Jorge Baptista | Pushpak Bhattacharyya | Christiane Fellbaum | Mikel Forcada | Chu-Ren Huang | Svetla Koeva | Cvetana Krstev | Eric Laporte
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing

pdf bib
An efficient method to assist non-expert users in extending dictionaries by assigning stems and inflectional paradigms to unknknown words
Miquel Esplà-Gomis | Víctor M. Sánchez-Cartegna | Felipe Sánchez-Martínez | Rafael C. Carrasco | Mikel L. Forcada | Juan Antonio Pérez-Ortiz
Proceedings of the 17th Annual conference of the European Association for Machine Translation

pdf bib
On the annotation of TMX translation memories for advanced leveraging in computer-aided translation
Mikel Forcada
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The term advanced leveraging refers to extensions beyond the current usage of translation memory (TM) in computer-aided translation (CAT). One of these extensions is the ability to identify and use matches on the sub-segment level ― for instance, using sub-sentential elements when segments are sentences― to help the translator when a reasonable fuzzy-matched proposal is not available; some such functionalities have started to become available in commercial CAT tools. Resources such as statistical word aligners, external machine translation systems, glossaries and term bases could be used to identify and annotate segment-level translation units at the sub-segment level, but there is currently no single, agreed standard supporting the interchange of sub-segmental annotation of translation memories to create a richer translation resource. This paper discusses the capabilities and limitations of some current standards, envisages possible alternatives, and ends with a tentative proposal which slightly abuses (repurposes) the usage of existing elements in the TMX standard.

2012

pdf bib
Flexible finite-state lexical selection for rule-based machine translation
Francis M. Tyers | Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of the 16th Annual conference of the European Association for Machine Translation

pdf bib
UAlacant: Using Online Machine Translation for Cross-Lingual Textual Entailment
Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Mikel L. Forcada
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf bib
Proceedings of the 15th Annual conference of the European Association for Machine Translation
Mikel L. Forcada | Heidi Depraetere | Vincent Vandeghinste
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
Using word alignments to assist computer-aided translation users by marking which target-side words to change or keep unedited
Miquel Esplà | Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
Using Example-Based MT to Support Statistical MT when Translating Homogeneous Data in a Resource-Poor Setting
Sandipan Dandapat | Sara Morrissey | Andy Way | Mikel L. Forcada
Proceedings of the 15th Annual conference of the European Association for Machine Translation

2010

pdf bib
MATREX: The DCU MT System for WMT 2010
Sergio Penkale | Rejwanul Haque | Sandipan Dandapat | Pratyush Banerjee | Ankit K. Srivastava | Jinhua Du | Pavel Pecina | Sudip Kumar Naskar | Mikel L. Forcada | Andy Way
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

2005

pdf bib
An open-source shallow-transfer machine translation engine for the Romance languages of Spain
Antonio M. Corbi-Bellot | Mikel L. Forcada | Sergio Ortíz-Rojas | Juan Antonio Pérez-Ortiz | Gema Ramírez-Sánchez | Felipe Sánchez-Martínez | Iñaki Alegria | Aingeru Mayor | Kepa Sarasola
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

pdf bib
LIHLA: Shared Task System Description
Helena M. Caseli | Maria G. V. Nunes | Mikel L. Forcada
Proceedings of the ACL Workshop on Building and Using Parallel Texts

2002

pdf bib
Incremental Construction and Maintenance of Minimal Finite-State Automata
Rafael C. Carrasco | Mikel L. Forcada
Computational Linguistics, Volume 28, Number 2, June 2002

pdf bib
Explaining real MT to translators: between compositional semantics and word-for-word
Mikel L. Forcada
Proceedings of the 6th EAMT Workshop: Teaching Machine Translation

Search
Co-authors
Venues