Fabio Kepler

Also published as: Fabio Natanael Kepler, Fabio N. Kepler, F. Kepler


pdf bib
OpenKiwi: An Open Source Framework for Quality Estimation
Fabio Kepler | Jonay Trénous | Marcos Treviso | Miguel Vera | André F. T. Martins
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We introduce OpenKiwi, a Pytorch-based open source framework for translation quality estimation. OpenKiwi supports training and testing of word-level and sentence-level quality estimation systems, implementing the winning systems of the WMT 2015–18 quality estimation campaigns. We benchmark OpenKiwi on two datasets from WMT 2018 (English-German SMT and NMT), yielding state-of-the-art performance on the word-level tasks and near state-of-the-art in the sentence-level tasks.

pdf bib
Unbabel’s Participation in the WMT19 Translation Quality Estimation Shared Task
Fabio Kepler | Jonay Trénous | Marcos Treviso | Miguel Vera | António Góis | M. Amin Farajian | António V. Lopes | André F. T. Martins
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

We present the contribution of the Unbabel team to the WMT 2019 Shared Task on Quality Estimation. We participated on the word, sentence, and document-level tracks, encompassing 3 language pairs: English-German, English-Russian, and English-French. Our submissions build upon the recent OpenKiwi framework: We combine linear, neural, and predictor-estimator systems with new transfer learning approaches using BERT and XLM pre-trained models. We compare systems individually and propose new ensemble techniques for word and sentence-level predictions. We also propose a simple technique for converting word labels into document-level predictions. Overall, our submitted systems achieve the best results on all tracks and language pairs by a considerable margin.


pdf bib
Pushing the Limits of Translation Quality Estimation
André F. T. Martins | Marcin Junczys-Dowmunt | Fabio N. Kepler | Ramón Astudillo | Chris Hokamp | Roman Grundkiewicz
Transactions of the Association for Computational Linguistics, Volume 5

Translation quality estimation is a task of growing importance in NLP, due to its potential to reduce post-editing human effort in disruptive ways. However, this potential is currently limited by the relatively low accuracy of existing systems. In this paper, we achieve remarkable improvements by exploiting synergies between the related tasks of word-level quality estimation and automatic post-editing. First, we stack a new, carefully engineered, neural model into a rich feature-based word-level quality estimation system. Then, we use the output of an automatic post-editing system as an extra feature, obtaining striking results on WMT16: a word-level FMULT1 score of 57.47% (an absolute gain of +7.95% over the current state of the art), and a Pearson correlation score of 65.56% for sentence-level HTER prediction (an absolute gain of +13.36%).

pdf bib
Unbabel’s Participation in the WMT17 Translation Quality Estimation Shared Task
André F. T. Martins | Fabio Kepler | José Monteiro
Proceedings of the Second Conference on Machine Translation

pdf bib
Fusion of Simple Models for Native Language Identification
Fabio Kepler | Ramon Astudillo | Alberto Abad
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

In this paper we describe the approaches we explored for the 2017 Native Language Identification shared task. We focused on simple word and sub-word units avoiding heavy use of hand-crafted features. Following recent trends, we explored linear and neural networks models to attempt to compensate for the lack of rich feature use. Initial efforts yielded f1-scores of 82.39% and 83.77% in the development and test sets of the fusion track, and were officially submitted to the task as team L2F. After the task was closed, we carried on further experiments and relied on a late fusion strategy for combining our simple proposed approaches with modifications of the baselines provided by the task. As expected, the i-vectors based sub-system dominates the performance of the system combinations, and results in the major contributor to our achieved scores. Our best combined system achieves 90.1% and 90.2% f1-score in the development and test sets of the fusion track, respectively.


pdf bib
A Web Tool for Building Parallel Corpora of Spoken and Sign Languages
Alex Becker | Fabio Kepler | Sara Candeias
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we describe our work in building an online tool for manually annotating texts in any spoken language with SignWriting in any sign language. The existence of such tool will allow the creation of parallel corpora between spoken and sign languages that can be used to bootstrap the creation of efficient tools for the Deaf community. As an example, a parallel corpus between English and American Sign Language could be used for training Machine Learning models for automatic translation between the two languages. Clearly, this kind of tool must be designed in a way that it eases the task of human annotators, not only by being easy to use, but also by giving smart suggestions as the annotation progresses, in order to save time and effort. By building a collaborative, online, easy to use annotation tool for building parallel corpora between spoken and sign languages we aim at helping the development of proper resources for sign languages that can then be used in state-of-the-art models currently used in tools for spoken languages. There are several issues and difficulties in creating this kind of resource, and our presented tool already deals with some of them, like adequate text representation of a sign and many to many alignments between words and signs.

pdf bib
Unbabel’s Participation in the WMT16 Word-Level Translation Quality Estimation Shared Task
André F. T. Martins | Ramón Astudillo | Chris Hokamp | Fabio Kepler
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers


pdf bib
Variable-Length Markov Models and Ambiguous Words in Portuguese
Fabio Natanael Kepler | Marcelo Finger
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas

pdf bib
An Integrated Tool for Annotating Historical Corpora
Pablo Picasso Feliciano de Faria | Fabio Natanael Kepler | Maria Clara Paixão de Sousa
Proceedings of the Fourth Linguistic Annotation Workshop


pdf bib
A novel Textual Encoding paradigm based on Semantic Web tools and semantics
G. Tummarello | C. Morbidoni | F. Kepler | F. Piazza | P. Puliti
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we perform a preliminary evaluation on how Semantic Web technologies such as RDF and OWL can be used to perform textual encoding. Among the potential advantages, we notice how RDF, given its conceptual graph structure, appears naturally suited to deal with overlapping hierarchies of annotations, something notoriously problematic using classic XML based markup. To conclude, we show how complex querying can be performed using slight modifications of already existing Semantic Web query tools.