Francisco Casacuberta

Also published as: F. Casacuberta


2020

pdf bib
A User Study of the Incremental Learning in NMT
Miguel Domingo | Mercedes García-Martínez | Álvaro Peris | Alexandre Helle | Amando Estela | Laurent Bié | Francisco Casacuberta | Manuel Herranz
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

In the translation industry, human experts usually supervise and post-edit machine translation hypotheses. Adaptive neural machine translation systems, able to incrementally update the underlying models under an online learning regime, have been proven to be useful to improve the efficiency of this workflow. However, this incremental adaptation is somewhat unstable, and it may lead to undesirable side effects. One of them is the sporadic appearance of made-up words, as a byproduct of an erroneous application of subword segmentation techniques. In this work, we extend previous studies on on-the-fly adaptation of neural machine translation systems. We perform a user study involving professional, experienced post-editors, delving deeper on the aforementioned problems. Results show that adaptive systems were able to learn how to generate the correct translation for task-specific terms, resulting in an improvement of the user’s productivity. We also observed a close similitude, in terms of morphology, between made-up words and the words that were expected.

pdf bib
NICE: Neural Integrated Custom Engines
Daniel Marín Buj | Daniel Ibáñez García | Zuzanna Parcheta | Francisco Casacuberta
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

In this paper, we present a machine translation system implemented by the Translation Centre for the Bodies of the European Union (CdT). The main goal of this project is to create domain-specific machine translation engines in order to support machine translation services and applications to the Translation Centre’s clients. In this article, we explain the entire implementation process of NICE: Neural Integrated Custom Engines. We describe the problems identified and the solutions provided, and present the final results for different language pairs. Finally, we describe the work that will be done on this project in the future.

2019

pdf bib
Demonstration of a Neural Machine Translation System with Online Learning for Translators
Miguel Domingo | Mercedes García-Martínez | Amando Estela Pastor | Laurent Bié | Alexander Helle | Álvaro Peris | Francisco Casacuberta | Manuel Herranz Pérez
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present a demonstration of our system, which implements online learning for neural machine translation in a production environment. These techniques allow the system to continuously learn from the corrections provided by the translators. We implemented an end-to-end platform integrating our machine translation servers to one of the most common user interfaces for professional translators: SDL Trados Studio. We pretend to save post-editing effort as the machine is continuously learning from its mistakes and adapting the models to a specific domain or user style.

pdf bib
A Neural, Interactive-predictive System for Multimodal Sequence to Sequence Tasks
Álvaro Peris | Francisco Casacuberta
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present a demonstration of a neural interactive-predictive system for tackling multimodal sequence to sequence tasks. The system generates text predictions to different sequence to sequence tasks: machine translation, image and video captioning. These predictions are revised by a human agent, who introduces corrections in the form of characters. The system reacts to each correction, providing alternative hypotheses, compelling with the feedback provided by the user. The final objective is to reduce the human effort required during this correction process. This system is implemented following a client-server architecture. For accessing the system, we developed a website, which communicates with the neural model, hosted in a local server. From this website, the different tasks can be tackled following the interactive–predictive framework. We open-source all the code developed for building this system. The demonstration in hosted in http://casmacat.prhlt.upv.es/interactive-seq2seq.

pdf bib
Filtering of Noisy Parallel Corpora Based on Hypothesis Generation
Zuzanna Parcheta | Germán Sanchis-Trilles | Francisco Casacuberta
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

The filtering task of noisy parallel corpora in WMT2019 aims to challenge participants to create filtering methods to be useful for training machine translation systems. In this work, we introduce a noisy parallel corpora filtering system based on generating hypotheses by means of a translation model. We train translation models in both language pairs: Nepali–English and Sinhala–English using provided parallel corpora. We select the training subset for three language pairs (Nepali, Sinhala and Hindi to English) jointly using bilingual cross-entropy selection to create the best possible translation model for both language pairs. Once the translation models are trained, we translate the noisy corpora and generate a hypothesis for each sentence pair. We compute the smoothed BLEU score between the target sentence and generated hypothesis. In addition, we apply several rules to discard very noisy or inadequate sentences which can lower the translation score. These heuristics are based on sentence length, source and target similarity and source language detection. We compare our results with the baseline published on the shared task website, which uses the Zipporah model, over which we achieve significant improvements in one of the conditions in the shared task. The designed filtering system is domain independent and all experiments are conducted using neural machine translation.

pdf bib
Incremental Adaptation of NMT for Professional Post-editors: A User Study
Miguel Domingo | Mercedes García-Martínez | Álvaro Peris | Alexandre Helle | Amando Estela | Laurent Bié | Francisco Casacuberta | Manuel Herranz
Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks

2018

pdf bib
Active Learning for Interactive Neural Machine Translation of Data Streams
Álvaro Peris | Francisco Casacuberta
Proceedings of the 22nd Conference on Computational Natural Language Learning

We study the application of active learning techniques to the translation of unbounded data streams via interactive neural machine translation. The main idea is to select, from an unbounded stream of source sentences, those worth to be supervised by a human agent. The user will interactively translate those samples. Once validated, these data is useful for adapting the neural machine translation model. We propose two novel methods for selecting the samples to be validated. We exploit the information from the attention mechanism of a neural machine translation system. Our experiments show that the inclusion of active learning techniques into this pipeline allows to reduce the effort required during the process, while increasing the quality of the translation system. Moreover, it enables to balance the human effort required for achieving a certain translation quality. Moreover, our neural system outperforms classical approaches by a large margin.

2017

pdf bib
Adapting Neural Machine Translation with Parallel Synthetic Data
Mara Chinea-Ríos | Álvaro Peris | Francisco Casacuberta
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
Beyond Prefix-Based Interactive Translation Prediction
Jesús González-Rubio | Daniel Ortiz-Martínez | Francisco Casacuberta | José Miguel Benedi Ruiz
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning

pdf bib
Interactive-Predictive Translation Based on Multiple Word-Segments
Miguel Domingo | Alvaro Peris | Francisco Casacuberta
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

2014

pdf bib
CASMACAT: A Computer-assisted Translation Workbench
Vicent Alabau | Christian Buck | Michael Carl | Francisco Casacuberta | Mercedes García-Martínez | Ulrich Germann | Jesús González-Rubio | Robin Hill | Philipp Koehn | Luis Leiva | Bartolomé Mesa-Lao | Daniel Ortiz-Martínez | Herve Saint-Amand | Germán Sanchis Trilles | Chara Tsoukala
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
The New Thot Toolkit for Fully-Automatic and Interactive Statistical Machine Translation
Daniel Ortiz-Martínez | Francisco Casacuberta
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Inference of Phrase-Based Translation Models via Minimum Description Length
Jesús González-Rubio | Francisco Casacuberta
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation
Ulrich Germann | Michael Carl | Philipp Koehn | Germán Sanchis-Trilles | Francisco Casacuberta | Robin Hill | Sharon O’Brien
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation

bib
Efficient wordgraph for interactive translation prediction
Germán Sanchis-Trilles | Daniel Ortiz-Martínez | Francisco Casacuberta
Proceedings of the 17th Annual conference of the European Association for Machine Translation

pdf bib
CASMACAT: cognitive analysis and statistical methods for advanced computer aided translation
Philipp Koehn | Michael Carl | Francisco Casacuberta | Eva Marcos
Proceedings of the 17th Annual conference of the European Association for Machine Translation

pdf bib
Evaluating the effects of interactivity in a post-editing workbench
Nancy Underwood | Bartolomé Mesa-Lao | Mercedes García Martínez | Michael Carl | Vicent Alabau | Jesús González-Rubio | Luis A. Leiva | Germán Sanchis-Trilles | Daniel Ortíz-Martínez | Francisco Casacuberta
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper describes the field trial and subsequent evaluation of a post-editing workbench which is currently under development in the EU-funded CasMaCat project. Based on user evaluations of the initial prototype of the workbench, this second prototype of the workbench includes a number of interactive features designed to improve productivity and user satisfaction. Using CasMaCat’s own facilities for logging keystrokes and eye tracking, data were collected from nine post-editors in a professional setting. These data were then used to investigate the effects of the interactive features on productivity, quality, user satisfaction and cognitive load as reflected in the post-editors’ gaze activity. These quantitative results are combined with the qualitative results derived from user questionnaires and interviews conducted with all the participants.

pdf bib
Online optimisation of log-linear weights in interactive machine translation
Mara Chinea Rios | Germán Sanchis-Trilles | Daniel Ortiz-Martínez | Francisco Casacuberta
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Whenever the quality provided by a machine translation system is not enough, a human expert is required to correct the sentences provided by the machine translation system. In such a setup, it is crucial that the system is able to learn from the errors that have already been corrected. In this paper, we analyse the applicability of discriminative ridge regression for learning the log-linear weights of a state-of-the-art machine translation system underlying an interactive machine translation framework, with encouraging results.

2013

pdf bib
Interactive Machine Translation using Hierarchical Translation Models
Jesús González-Rubio | Daniel Ortiz-Martínez | José-Miguel Benedí | Francisco Casacuberta
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2012

pdf bib
User Evaluation of Interactive Machine Translation Systems
Vincent Alabau | Luis A. Leiva | Daniel Ortiz-Martínez | Francisco Casacuberta
Proceedings of the 16th Annual conference of the European Association for Machine Translation

pdf bib
PRHLT Submission to the WMT12 Quality Estimation Task
Jesús González Rubio | Alberto Sanchis | Francisco Casacuberta
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Finite-State Acoustic and Translation Model Composition in Statistical Speech Translation: Empirical Assessment
Alicia Pérez | M. Inés Torres | Francisco Casacuberta
Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing

pdf bib
Does more data always yield better translations?
Guillem Gascó | Martha-Alicia Rocha | Germán Sanchis-Trilles | Jesús Andrés-Ferrer | Francisco Casacuberta
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Active learning for interactive machine translation
Jesús González-Rubio | Daniel Ortiz-Martínez | Francisco Casacuberta
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib
The UPV-PRHLT combination system for WMT 2011
Jesús González-Rubio | Francisco Casacuberta
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Stochastic K-TSS Bi-Languages for Machine Translation
M. Inés Torres | Francisco Casacuberta
Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing

pdf bib
Minimum Bayes-risk System Combination
Jesús González-Rubio | Alfons Juan | Francisco Casacuberta
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Improving On-line Handwritten Recognition using Translation Models in Multimodal Interactive Machine Translation
Vicent Alabau | Alberto Sanchis | Francisco Casacuberta
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
An Interactive Machine Translation System with Online Learning
Daniel Ortiz-Martínez | Luis A. Leiva | Vicent Alabau | Ismael García-Varea | Francisco Casacuberta
Proceedings of the ACL-HLT 2011 System Demonstrations

2010

pdf bib
Online Learning for Interactive Statistical Machine Translation
Daniel Ortiz-Martínez | Ismael García-Varea | Francisco Casacuberta
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Potential scope of a fully-integrated architecture for speech translation
Alicia Pérez | María Inés Torres | Francisco Casacuberta
Proceedings of the 14th Annual conference of the European Association for Machine Translation

pdf bib
On the Use of Confidence Measures within an Interactive-predictive Machine Translation System
Jesús González-Rubio | Daniel Ortíz-Martínez | Francisco Casacuberta
Proceedings of the 14th Annual conference of the European Association for Machine Translation

pdf bib
UPV-PRHLT English–Spanish System for WMT10
Germán Sanchis-Trilles | Jesús Andrés-Ferrer | Guillem Gascó | Jesús González-Rubio | Pascual Martínez-Gómez | Martha-Alicia Rocha | Joan-Andreu Sánchez | Francisco Casacuberta
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
The UPV-PRHLT Combination System for WMT 2010
Jesús González-Rubio | Germán Sanchis-Trilles | Joan-Andreu Sánchez | Jesús Andrés-Ferrer | Guillem Gascó | Pascual Martínez-Gómez | Martha-Alicia Rocha | Francisco Casacuberta
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Log-linear weight optimisation via Bayesian Adaptation in Statistical Machine Translation
Germán Sanchis-Trilles | Francisco Casacuberta
Coling 2010: Posters

pdf bib
Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures
Jesús González-Rubio | Daniel Ortiz-Martínez | Francisco Casacuberta
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Saturnalia: A Latin-Catalan Parallel Corpus for Statistical MT
Jesús González-Rubio | Jorge Civera | Alfons Juan | Francisco Casacuberta
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Currently, a great effort is being carried out in the digitalisation of large historical document collections for preservation purposes. The documents in these collections are usually written in ancient languages, such as Latin or Greek, which limits the access of the general public to their content due to the language barrier. Therefore, digital libraries aim not only at storing raw images of digitalised documents, but also to annotate them with their corresponding text transcriptions and translations into modern languages. Unfortunately, ancient languages have at their disposal scarce electronic resources to be exploited by natural language processing techniques. This paper describes the compilation process of a novel Latin-Catalan parallel corpus as a new task for statistical machine translation (SMT). Preliminary experimental results are also reported using a state-of-the-art phrase-based SMT system. The results presented in this work reveal the complexity of the task and its challenging, but interesting nature for future development.

2009

pdf bib
Interactive Machine Translation Based on Partial Statistical Phrase-based Alignments
Daniel Ortiz-Martínez | Ismael García-Varea | Francisco Casacuberta
Proceedings of the International Conference RANLP-2009

pdf bib
GREAT: A Finite-State Machine Translation Toolkit Implementing a Grammatical Inference Approach for Transducer Inference (GIATI)
Jorge González | Francisco Casacuberta
Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference

pdf bib
Statistical Approaches to Computer-Assisted Translation
Sergio Barrachina | Oliver Bender | Francisco Casacuberta | Jorge Civera | Elsa Cubel | Shahram Khadivi | Antonio Lagarda | Hermann Ney | Jesús Tomás | Enrique Vidal | Juan-Miguel Vilar
Computational Linguistics, Volume 35, Number 1, March 2009

pdf bib
Statistical Post-Editing of a Rule-Based Machine Translation System
Antonio-L. Lagarda | Vicent Alabau | Francisco Casacuberta | Roberto Silva | Enrique Díaz-de-Liaño
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

2008

pdf bib
A finite-state framework for log-linear models in machine translation
Jorge González | Francisco Casacuberta
Proceedings of the 12th Annual conference of the European Association for Machine Translation

pdf bib
A novel alignment model inspired on IBM Model 1
Jesús González-Rubio | Germán Sanchis-Trilles | Alfons Juan | Francisco Casacuberta
Proceedings of the 12th Annual conference of the European Association for Machine Translation

pdf bib
Applying boosting to statistical machine translation
Antonio L. Lagarda | Francisco Casacuberta
Proceedings of the 12th Annual conference of the European Association for Machine Translation

pdf bib
Phrase-level alignment generation using a smoothed loglinear phrase-based statistical alignment model
Daniel Ortiz-Martínez | Ismael García-Varea | Francisco Casacuberta
Proceedings of the 12th Annual conference of the European Association for Machine Translation

pdf bib
Improving Interactive Machine Translation via Mouse Actions
Germán Sanchis-Trilles | Daniel Ortiz-Martínez | Jorge Civera | Francisco Casacuberta | Enrique Vidal | Hieu Hoang
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
An Integrated Architecture for Speech-Input Multi-Target Machine Translation
Alicia Pérez | M. Teresa González | M. Inés Torres | Francisco Casacuberta
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

pdf bib
Speech-Input Multi-Target Machine Translation
Alicia Pérez | M. Teresa González | M. Inés Torres | Francisco Casacuberta
Proceedings of the Second Workshop on Statistical Machine Translation

2006

pdf bib
Statistical Phrase-Based Models for Interactive Computer-Assisted Translation
Jesús Tomás | Francisco Casacuberta
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Generalized Stack Decoding Algorithms for Statistical Machine Translation
Daniel Ortiz Martínez | Ismael García Varea | Francisco Casacuberta
Proceedings on the Workshop on Statistical Machine Translation

pdf bib
A Computer-Assisted Translation Tool based on Finite-State Technology
Jorge Civera | Antonio L. Lagarda | Elsa Cubel | Francisco Casacuberta | Enrique Vidal | Juan M. Vilar | Sergio Barrachina
Proceedings of the 11th Annual conference of the European Association for Machine Translation

2004

pdf bib
Machine Translation with Inferred Stochastic Finite-State Transducers
Francisco Casacuberta | Enrique Vidal
Computational Linguistics, Volume 30, Number 2, June 2004

pdf bib
Translation Memories Enrichment by Statistical Bilingual Segmentation
Francisco Nevado | Francisco Casacuberta | Josu Landa
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

A majority of Machine Aided Translation systems are based on comparisons between a source sentence and reference sentences stored in Translation Memories (TMs). The translation search is done by looking for sentences in a database which are similar to the source sentence. TMs have two basic limitations: the dependency on the repetition of complete sentences and the high cost of building a TM. As human translators do not only remember sentences from their preceding translations, but they also decompose the sentence to be translated and work with smaller units, it would be desirable to enrich the TM database with smaller translation units. This enrichment should also be automatic in order not to increase the cost of building a TM. We propose the application of two automatic bilingual segmentation techniques based on statistical translation methods in order to create new, shorter bilingual segments to be included in a TM database. An evaluation of the two techniques is carried out for a bilingual Basque-Spanish task.

pdf bib
From Machine Translation to Computer Assisted Translation using Finite-State Models
Jorge Civera | Elsa Cubel | Antonio L. Lagarda | David Picó | Jorge González | Enrique Vidal | Francisco Casacuberta | Juan M. Vilar | Sergio Barrachina
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2003

pdf bib
Parallel Corpora Segmentation Using Anchor Words
Francisco Nevado | Francisco Casacuberta | Enrique Vidal
Proceedings of the 7th International EAMT workshop on MT and other language technology tools, Improving MT through other language technology tools, Resource and tools for building MT at EACL 2003

pdf bib
A Quantitative Method for Machine Translation Evaluation
Jesús Tomás | Josep Àngel Mas | Francisco Casacuberta
Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable?

pdf bib
Adapting finite-state translation to the TransType2 project
Elsa Cubel | Jorge González | Antonio Lagarda | Francisco Casacuberta | Alfons Juan | Enrique Vidal
EAMT Workshop: Improving MT through other language technology tools: resources and tools for building MT

2002

pdf bib
Architectures for Speech-to-Speech Translation Using Finite-state Models
Francisco Casacuberta | Enrique Vidal | Juan Miguel Vilar
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems

pdf bib
Improving Alignment Quality in Statistical Machine Translation Using Context-dependent Maximum Entropy Models
Ismael García Varea | Franz J. Och | Hermann Ney | Francisco Casacuberta
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach
Ismael García-Varea | Franz J. Och | Hermann Ney | Francisco Casacuberta
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

1997

pdf bib
Using Categories in the EUTRANS System
J. C. Amengual | J. M. Benedí | F. Casacuberta | A. Castaño | A. Castellanos | D. Llorens | A. Marzal | F. Prat | E. Vidal | J. M. Vilar
Spoken Language Translation