Eleftherios Avramidis


2019

pdf bib
Linguistic Evaluation of German-English Machine Translation Using a Test Suite
Eleftherios Avramidis | Vivien Macketanz | Ursula Strohriegel | Hans Uszkoreit
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

We present the results of the application of a grammatical test suite for German-to-English MT on the systems submitted at WMT19, with a detailed analysis for 107 phenomena organized in 14 categories. The systems still translate wrong one out of four test items in average. Low performance is indicated for idioms, modals, pseudo-clefts, multi-word expressions and verb valency. When compared to last year, there has been a improvement of function words, non verbal agreement and punctuation. More detailed conclusions about particular systems and phenomena are also presented.

pdf bib
Train, Sort, Explain: Learning to Diagnose Translation Models
Robert Schwarzenberg | David Harbecke | Vivien Macketanz | Eleftherios Avramidis | Sebastian Möller
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

Evaluating translation models is a trade-off between effort and detail. On the one end of the spectrum there are automatic count-based methods such as BLEU, on the other end linguistic evaluations by humans, which arguably are more informative but also require a disproportionately high effort. To narrow the spectrum, we propose a general approach on how to automatically expose systematic differences between human and machine translations to human experts. Inspired by adversarial settings, we train a neural text classifier to distinguish human from machine translations. A classifier that performs and generalizes well after training should recognize systematic differences between the two classes, which we uncover with neural explainability methods. Our proof-of-concept implementation, DiaMaT, is open source. Applied to a dataset translated by a state-of-the-art neural Transformer model, DiaMaT achieves a classification accuracy of 75% and exposes meaningful differences between humans and the Transformer, amidst the current discussion about human parity.

2018

pdf bib
Fine-grained evaluation of Quality Estimation for Machine translation based on a linguistically motivated Test Suite
Eleftherios Avramidis | Vivien Macketanz | Arle Lommel | Hans Uszkoreit
Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing

pdf bib
Fine-grained evaluation of German-English Machine Translation based on a Test Suite
Vivien Macketanz | Eleftherios Avramidis | Aljoscha Burchardt | Hans Uszkoreit
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We present an analysis of 16 state-of-the-art MT systems on German-English based on a linguistically-motivated test suite. The test suite has been devised manually by a team of language professionals in order to cover a broad variety of linguistic phenomena that MT often fails to translate properly. It contains 5,000 test sentences covering 106 linguistic phenomena in 14 categories, with an increased focus on verb tenses, aspects and moods. The MT outputs are evaluated in a semi-automatic way through regular expressions that focus only on the part of the sentence that is relevant to each phenomenon. Through our analysis, we are able to compare systems based on their performance on these categories. Additionally, we reveal strengths and weaknesses of particular systems and we identify grammatical phenomena where the overall performance of MT is relatively low.

2017

pdf bib
Sentence-level quality estimation by predicting HTER as a multi-component metric
Eleftherios Avramidis
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
Tools and Guidelines for Principled Machine Translation Development
Nora Aranberri | Eleftherios Avramidis | Aljoscha Burchardt | Ondřej Klejch | Martin Popel | Maja Popović
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This work addresses the need to aid Machine Translation (MT) development cycles with a complete workflow of MT evaluation methods. Our aim is to assess, compare and improve MT system variants. We hereby report on novel tools and practices that support various measures, developed in order to support a principled and informed approach of MT development. Our toolkit for automatic evaluation showcases quick and detailed comparison of MT system variants through automatic metrics and n-gram feedback, along with manual evaluation via edit-distance, error annotation and task-based feedback.

pdf bib
DFKI’s system for WMT16 IT-domain task, including analysis of systematic errors
Eleftherios Avramidis | Aljoscha Burchardt | Vivien Macketanz | Ankit Srivastava
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Deeper Machine Translation and Evaluation for German
Eleftherios Avramidis | Vivien Macketanz | Aljoscha Burchardt | Jindrich Helcl | Hans Uszkoreit
Proceedings of the 2nd Deep Machine Translation Workshop

2015

pdf bib
Poor man’s lemmatisation for automatic error classification
Maja Popovic | Mihael Arcan | Eleftherios Avramidis | Aljoscha Burchardt
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
DFKI’s experimental hybrid MT system for WMT 2015
Eleftherios Avramidis | Maja Popović | Aljoscha Burchardt
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Poor man’s lemmatisation for automatic error classification
Maja Popović | Mihael Arčan | Eleftherios Avramidis | Aljoscha Burchardt | Arle Lommel
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Towards Deeper MT - A Hybrid System for German
Eleftherios Avramidis | Aljoscha Burchardt | Maja Popović | Hans Uszkoreit
Proceedings of the 1st Deep Machine Translation Workshop

2014

pdf bib
Efforts on Machine Learning over Human-mediated Translation Edit Rate
Eleftherios Avramidis
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Correlating decoding events with errors in Statistical Machine Translation
Eleftherios Avramidis | Maja Popović
Proceedings of the 11th International Conference on Natural Language Processing

pdf bib
Using a new analytic measure for the annotation and analysis of MT errors on real data
Arle Lommel | Aljoscha Burchardt | Maja Popović | Kim Harris | Eleftherios Avramidis | Hans Uszkoreit
Proceedings of the 17th Annual conference of the European Association for Machine Translation

pdf bib
Relations between different types of post-editing operations, cognitive effort and temporal effort
Maja Popović | Arle Lommel | Aljoscha Burchardt | Eleftherios Avramidis | Hans Uszkoreit
Proceedings of the 17th Annual conference of the European Association for Machine Translation

pdf bib
The tara corpus of human-annotated machine translations
Eleftherios Avramidis | Aljoscha Burchardt | Sabine Hunsicker | Maja Popović | Cindy Tscherwinka | David Vilar | Hans Uszkoreit
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Human translators are the key to evaluating machine translation (MT) quality and also to addressing the so far unanswered question when and how to use MT in professional translation workflows. This paper describes the corpus developed as a result of a detailed large scale human evaluation consisting of three tightly connected tasks: ranking, error classification and post-editing.

2013

pdf bib
Selecting Feature Sets for Comparative and Time-Oriented Quality Estimation of Machine Translation Output
Eleftherios Avramidis | Maja Popović
Proceedings of the Eighth Workshop on Statistical Machine Translation

2012

pdf bib
Involving Language Professionals in the Evaluation of Machine Translation
Eleftherios Avramidis | Aljoscha Burchardt | Christian Federmann | Maja Popović | Cindy Tscherwinka | David Vilar
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Significant breakthroughs in machine translation only seem possible if human translators are taken into the loop. While automatic evaluation and scoring mechanisms such as BLEU have enabled the fast development of systems, it is not clear how systems can meet real-world (quality) requirements in industrial translation scenarios today. The taraXÜ project paves the way for wide usage of hybrid machine translation outputs through various feedback loops in system development. In a consortium of research and industry partners, the project integrates human translators into the development process for rating and post-editing of machine translation outputs thus collecting feedback for possible improvements.

pdf bib
A Richly Annotated, Multilingual Parallel Corpus for Hybrid Machine Translation
Eleftherios Avramidis | Marta R. Costa-jussà | Christian Federmann | Josef van Genabith | Maite Melero | Pavel Pecina
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In recent years, machine translation (MT) research has focused on investigating how hybrid machine translation as well as system combination approaches can be designed so that the resulting hybrid translations show an improvement over the individual “component” translations. As a first step towards achieving this objective we have developed a parallel corpus with source text and the corresponding translation output from a number of machine translation engines, annotated with metadata information, capturing aspects of the translation process performed by the different MT systems. This corpus aims to serve as a basic resource for further research on whether hybrid machine translation algorithms and system combination techniques can benefit from additional (linguistically motivated, decoding, and runtime) information provided by the different systems involved. In this paper, we describe the annotated corpus we have created. We provide an overview on the component MT systems and the XLIFF-based annotation format we have developed. We also report on first experiments with the ML4HMT corpus data.

pdf bib
The ML4HMT Workshop on Optimising the Division of Labour in Hybrid Machine Translation
Christian Federmann | Eleftherios Avramidis | Marta R. Costa-jussà | Josef van Genabith | Maite Melero | Pavel Pecina
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We describe the “Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid Machine Translation” (ML4HMT) which aims to foster research on improved system combination approaches for machine translation (MT). Participants of the challenge are requested to build hybrid translations by combining the output of several MT systems of different types. We first describe the ML4HMT corpus used in the shared task, then explain the XLIFF-based annotation format we have designed for it, and briefly summarize the participating systems. Using both automated metrics scores and extensive manual evaluation, we discuss the individual performance of the various systems. An interesting result from the shared task is the fact that we were able to observe different systems winning according to the automated metrics scores when compared to the results from the manual evaluation. We conclude by summarising the first edition of the challenge and by giving an outlook to future work.

pdf bib
Quality estimation for Machine Translation output using linguistic analysis and decoding features
Eleftherios Avramidis
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Comparative Quality Estimation: Automatic Sentence-Level Ranking of Multiple Machine Translation Outputs
Eleftherios Avramidis
Proceedings of COLING 2012

2011

pdf bib
Evaluate with Confidence Estimation: Machine ranking of translation outputs using grammatical features
Eleftherios Avramidis | Maja Popovic | David Vilar | Aljoscha Burchardt
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Evaluation without references: IBM1 scores as evaluation metrics
Maja Popović | David Vilar | Eleftherios Avramidis | Aljoscha Burchardt
Proceedings of the Sixth Workshop on Statistical Machine Translation

2008

pdf bib
Enriching Morphologically Poor Languages for Statistical Machine Translation
Eleftherios Avramidis | Philipp Koehn
Proceedings of ACL-08: HLT