Maja Popović

Also published as: Maja Popovic


2020

pdf bib
Relations between comprehensibility and adequacy errors in machine translation output
Maja Popović
Proceedings of the 24th Conference on Computational Natural Language Learning

This work presents a detailed analysis of translation errors perceived by readers as comprehensibility and/or adequacy issues. The main finding is that good comprehensibility, similarly to good fluency, can mask a number of adequacy errors. Of all major adequacy errors, 30% were fully comprehensible, thus fully misleading the reader to accept the incorrect information. Another 25% of major adequacy errors were perceived as almost comprehensible, thus being potentially misleading. Also, a vast majority of omissions (about 70%) is hidden by comprehensibility. Further analysis of misleading translations revealed that the most frequent error types are ambiguity, mistranslation, noun phrase error, word-by-word translation, untranslated word, subject-verb agreement, and spelling error in the source text. However, none of these error types appears exclusively in misleading translations, but are also frequent in fully incorrect (incomprehensible inadequate) and discarded correct (incomprehensible adequate) translations. Deeper analysis is needed to potentially detect underlying phenomena specifically related to misleading translations.

pdf bib
On Context Span Needed for Machine Translation Evaluation
Sheila Castilho | Maja Popović | Andy Way
Proceedings of the 12th Language Resources and Evaluation Conference

Despite increasing efforts to improve evaluation of machine translation (MT) by going beyond the sentence level to the document level, the definition of what exactly constitutes a “document level” is still not clear. This work deals with the context span necessary for a more reliable MT evaluation. We report results from a series of surveys involving three domains and 18 target languages designed to identify the necessary context span as well as issues related to it. Our findings indicate that, despite the fact that some issues and spans are strongly dependent on domain and on the target language, a number of common patterns can be observed so that general guidelines for context-aware MT evaluation can be drawn.

pdf bib
Proceedings of 1st Workshop on Post-Editing in Modern-Day Translation
John E. Ortega | Marcello Federico | Constantin Orasan | Maja Popovic
Proceedings of 1st Workshop on Post-Editing in Modern-Day Translation

pdf bib
Neural Machine Translation for translating into Croatian and Serbian
Maja Popović | Alberto Poncelas | Marija Brkic | Andy Way
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects

In this work, we systematically investigate different set-ups for training of neural machine translation (NMT) systems for translation into Croatian and Serbian, two closely related South Slavic languages. We explore English and German as source languages, different sizes and types of training corpora, as well as bilingual and multilingual systems. We also explore translation of English IMDb user movie reviews, a domain/genre where only monolingual data are available. First, our results confirm that multilingual systems with joint target languages perform better. Furthermore, translation performance from English is much better than from German, partly because German is morphologically more complex and partly because the corpus consists mostly of parallel human translations instead of original text and its human translation. The translation from German should be further investigated systematically. For translating user reviews, creating synthetic in-domain parallel data through back- and forward-translation and adding them to a small out-of-domain parallel corpus can yield performance comparable with a system trained on a full out-of-domain corpus. However, it is still not clear what is the optimal size of synthetic in-domain data, especially for forward-translated data where the target language is machine translated. More detailed research including manual evaluation and analysis is needed in this direction.

pdf bib
Informative Manual Evaluation of Machine Translation Output
Maja Popović
Proceedings of the 28th International Conference on Computational Linguistics

This work proposes a new method for manual evaluation of Machine Translation (MT) output based on marking actual issues in the translated text. The novelty is that the evaluators are not assigning any scores, nor classifying errors, but marking all problematic parts (words, phrases, sentences) of the translation. The main advantage of this method is that the resulting annotations do not only provide overall scores by counting words with assigned tags, but can be further used for analysis of errors and challenging linguistic phenomena, as well as inter-annotator disagreements. Detailed analysis and understanding of actual problems are not enabled by typical manual evaluations where the annotators are asked to assign overall scores or to rank two or more translations. The proposed method is very general: it can be applied on any genre/domain and language pair, and it can be guided by various types of quality criteria. Also, it is not restricted to MT output, but can be used for other types of generated text.

pdf bib
On the differences between human translations
Maja Popovic
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

Many studies have confirmed that translated texts exhibit different features than texts originally written in the given language. This work explores texts translated by different translators taking into account expertise and native language. A set of computational analyses was conducted on three language pairs, English-Croatian, German-French and English-Finnish, and the results show that each of the factors has certain influence on the features of the translated texts, especially on sentence length and lexical richness. The results also indicate that for translations used for machine translation evaluation, it is important to specify these factors, especially if comparing machine translation quality with human translation quality is involved.

pdf bib
QRev: Machine Translation of User Reviews: What Influences the Translation Quality?
Maja Popovic
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

This project aims to identify the important aspects of translation quality of user reviews which will represent a starting point for developing better automatic MT metrics and challenge test sets, and will be also helpful for developing MT systems for this genre. We work on two types of reviews: Amazon products and IMDb movies, written in English and translated into two closely related target languages, Croatian and Serbian.

2019

pdf bib
Building English-to-Serbian Machine Translation System for IMDb Movie Reviews
Pintu Lohar | Maja Popović | Andy Way
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing

This paper reports the results of the first experiment dealing with the challenges of building a machine translation system for user-generated content involving a complex South Slavic language. We focus on translation of English IMDb user movie reviews into Serbian, in a low-resource scenario. We explore potentials and limits of (i) phrase-based and neural machine translation systems trained on out-of-domain clean parallel data from news articles (ii) creating additional synthetic in-domain parallel corpus by machine-translating the English IMDb corpus into Serbian. Our main findings are that morphology and syntax are better handled by the neural approach than by the phrase-based approach even in this low-resource mismatched domain scenario, however the situation is different for the lexical aspect, especially for person names. This finding also indicates that in general, machine translation of person names into Slavic languages (especially those which require/allow transcription) should be investigated more systematically.

pdf bib
Evaluating Conjunction Disambiguation on English-to-German and French-to-German WMT 2019 Translation Hypotheses
Maja Popović
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

We present a test set for evaluating an MT system’s capability to translate ambiguous conjunctions depending on the sentence structure. We concentrate on the English conjunction “but” and its French equivalent “mais” which can be translated into two different German conjunctions. We evaluate all English-to-German and French-to-German submissions to the WMT 2019 shared translation task. The evaluation is done mainly automatically, with additional fast manual inspection of unclear cases. All systems almost perfectly recognise the target conjunction “aber”, whereas accuracies for the other target conjunction “sondern” range from 78% to 97%, and the errors are mostly caused by replacing it with the alternative conjunction “aber”. The best performing system for both language pairs is a multilingual Transformer “TartuNLP” system trained on all WMT 2019 language pairs which use the Latin script, indicating that the multilingual approach is beneficial for conjunction disambiguation. As for other system features, such as using synthetic back-translated data, context-aware, hybrid, etc., no particular (dis)advantages can be observed. Qualitative manual inspection of translation hypotheses shown that highly ranked systems generally produce translations with high adequacy and fluency, meaning that these systems are not only capable of capturing the right conjunction whereas the rest of the translation hypothesis is poor. On the other hand, the low ranked systems generally exhibit lower fluency and poor adequacy.

pdf bib
Automatic error classification with multiple error labels
Maja Popovic | David Vilar
Proceedings of Machine Translation Summit XVII Volume 1: Research Track

pdf bib
On reducing translation shifts in translations intended for MT evaluation
Maja Popovic
Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks

pdf bib
Proceedings of the Qualities of Literary Machine Translation
James Hadley | Maja Popović | Haithem Afli | Andy Way
Proceedings of the Qualities of Literary Machine Translation

bib
Challenge Test Sets for MT Evaluation
Maja Popović | Sheila Castilho
Proceedings of Machine Translation Summit XVII Volume 3: Tutorial Abstracts

Most of the test sets used for the evaluation of MT systems reflect the frequency distribution of different phenomena found in naturally occurring data (”standard” or ”natural” test sets). However, to better understand particular strengths and weaknesses of MT systems, especially those based on neural networks, it is necessary to apply more focused evaluation procedures. Therefore, another type of test sets (”challenge” test sets, also called ”test suites”) is being increasingly employed in order to highlight points of difficulty which are relevant to model development, training, or using of the given system. This tutorial will be useful for anyone (researchers, developers, users, translators) interested in detailed evaluation and getting a better understanding of machine translation (MT) systems and models. The attendees will learn about the motivation and linguistic background of challenge test sets and a range of testing possibilities applied to the state-of-the-art MT systems, as well as a number of practical aspects and challenges.

pdf bib
Combining PBSMT and NMT Back-translated Data for Efficient NMT
Alberto Poncelas | Maja Popović | Dimitar Shterionov | Gideon Maillette de Buy Wenniger | Andy Way
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Neural Machine Translation (NMT) models achieve their best performance when large sets of parallel data are used for training. Consequently, techniques for augmenting the training set have become popular recently. One of these methods is back-translation, which consists on generating synthetic sentences by translating a set of monolingual, target-language sentences using a Machine Translation (MT) model. Generally, NMT models are used for back-translation. In this work, we analyze the performance of models when the training data is extended with synthetic data using different MT approaches. In particular we investigate back-translated data generated not only by NMT but also by Statistical Machine Translation (SMT) models and combinations of both. The results reveal that the models achieve the best performances when the training set is augmented with back-translated data created by merging different MT approaches.

pdf bib
Are ambiguous conjunctions problematic for machine translation?
Maja Popović | Sheila Castilho
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

The translation of ambiguous words still poses challenges for machine translation. In this work, we carry out a systematic quantitative analysis regarding the ability of different machine translation systems to disambiguate the source language conjunctions “but” and “and”. We evaluate specialised test sets focused on the translation of these two conjunctions. The test sets contain source languages that do not distinguish different variants of the given conjunction, whereas the target languages do. In total, we evaluate the conjunction “but” on 20 translation outputs, and the conjunction “and” on 10. All machine translation systems almost perfectly recognise one variant of the target conjunction, especially for the source conjunction “but”. The other target variant, however, represents a challenge for machine translation systems, with accuracy varying from 50% to 95% for “but” and from 20% to 57% for “and”. The major error for all systems is replacing the correct target variant with the opposite one.

pdf bib
Automated Text Simplification as a Preprocessing Step for Machine Translation into an Under-resourced Language
Sanja Štajner | Maja Popović
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In this work, we investigate the possibility of using fully automatic text simplification system on the English source in machine translation (MT) for improving its translation into an under-resourced language. We use the state-of-the-art automatic text simplification (ATS) system for lexically and syntactically simplifying source sentences, which are then translated with two state-of-the-art English-to-Serbian MT systems, the phrase-based MT (PBMT) and the neural MT (NMT). We explore three different scenarios for using the ATS in MT: (1) using the raw output of the ATS; (2) automatically filtering out the sentences with low grammaticality and meaning preservation scores; and (3) performing a minimal manual correction of the ATS output. Our results show improvement in fluency of the translation regardless of the chosen scenario, and difference in success of the three scenarios depending on the MT approach used (PBMT or NMT) with regards to improving translation fluency and post-editing effort.

2018

pdf bib
A Multilingual Wikified Data Set of Educational Material
Iris Hendrickx | Eirini Takoulidou | Thanasis Naskos | Katia Lida Kermanidis | Vilelmini Sosoni | Hugo de Vos | Maria Stasimioti | Menno van Zaanen | Panayota Georgakopoulou | Valia Kordoni | Maja Popovic | Markus Egg | Antal van den Bosch
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Complex Word Identification Using Character n-grams
Maja Popović
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

This paper investigates the use of character n-gram frequencies for identifying complex words in English, German and Spanish texts. The approach is based on the assumption that complex words are likely to contain different character sequences than simple words. The multinomial Naive Bayes classifier was used with n-grams of different lengths as features, and the best results were obtained for the combination of 2-grams and 4-grams. This variant was submitted to the Complex Word Identification Shared Task 2018 for all texts and achieved F-scores between 70% and 83%. The system was ranked in the middle range for all English texts, as third of fourteen submissions for German, and as tenth of seventeen submissions for Spanish. The method is not very convenient for the cross-language task, achieving only 59% on the French text.

pdf bib
Improving Machine Translation of English Relative Clauses with Automatic Text Simplification
Sanja Štajner | Maja Popović
Proceedings of the 1st Workshop on Automatic Text Adaptation (ATA)

2017

pdf bib
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts
Maja Popović | Jordan Boyd-Graber
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

pdf bib
chrF++: words helping character n-grams
Maja Popović
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
PE2rr Corpus: Manual Error Annotation of Automatically Pre-annotated MT Post-edits
Maja Popović | Mihael Arčan
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a freely available corpus containing source language texts from different domains along with their automatically generated translations into several distinct morphologically rich languages, their post-edited versions, and error annotations of the performed post-edit operations. We believe that the corpus will be useful for many different applications. The main advantage of the approach used for creation of the corpus is the fusion of post-editing and error classification tasks, which have usually been seen as two independent tasks, although naturally they are not. We also show benefits of coupling automatic and manual error classification which facilitates the complex manual error annotation task as well as the development of automatic error classification tools. In addition, the approach facilitates annotation of language pair related issues.

pdf bib
Tools and Guidelines for Principled Machine Translation Development
Nora Aranberri | Eleftherios Avramidis | Aljoscha Burchardt | Ondřej Klejch | Martin Popel | Maja Popović
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This work addresses the need to aid Machine Translation (MT) development cycles with a complete workflow of MT evaluation methods. Our aim is to assess, compare and improve MT system variants. We hereby report on novel tools and practices that support various measures, developed in order to support a principled and informed approach of MT development. Our toolkit for automatic evaluation showcases quick and detailed comparison of MT system variants through automatic metrics and n-gram feedback, along with manual evaluation via edit-distance, error annotation and task-based feedback.

pdf bib
TraMOOC (Translation for Massive Open Online Courses): providing reliable MT for MOOCs
Valia Kordoni | Lexi Birch | Ioana Buliga | Kostadin Cholakov | Markus Egg | Federico Gaspari | Yota Georgakopolou | Maria Gialama | Iris Hendrickx | Mitja Jermol | Katia Kermanidis | Joss Moorkens | Davor Orlic | Michael Papadopoulos | Maja Popović | Rico Sennrich | Vilelmini Sosoni | Dimitrios Tsoumakos | Antal van den Bosch | Menno van Zaanen | Andy Way
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products

pdf bib
chrF deconstructed: beta parameters and n-gram weights
Maja Popović
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Potential and Limits of Using Post-edits as Reference Translations for MT Evaluation
Maja Popovic | Mihael Arčan | Arle Lommel
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

pdf bib
Can Text Simplification Help Machine Translation?
Sanja Štajner | Maja Popovic
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

pdf bib
Language Related Issues for Machine Translation between Closely Related South Slavic Languages
Maja Popović | Mihael Arčan | Filip Klubička
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

Machine translation between closely related languages is less challenging and exibits a smaller number of translation errors than translation between distant languages, but there are still obstacles which should be addressed in order to improve such systems. This work explores the obstacles for machine translation systems between closely related South Slavic languages, namely Croatian, Serbian and Slovenian. Statistical systems for all language pairs and translation directions are trained using parallel texts from different domains, however mainly on spoken language i.e. subtitles. For translation between Serbian and Croatian, a rule-based system is also explored. It is shown that for all language pairs and translation systems, the main obstacles are differences between structural properties.

pdf bib
Enlarging Scarce In-domain English-Croatian Corpus for SMT of MOOCs Using Serbian
Maja Popović | Kostadin Cholakov | Valia Kordoni | Nikola Ljubešić
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

Massive Open Online Courses have been growing rapidly in size and impact. Yet the language barrier constitutes a major growth impediment in reaching out all people and educating all citizens. A vast majority of educational material is available only in English, and state-of-the-art machine translation systems still have not been tailored for this peculiar genre. In addition, a mere collection of appropriate in-domain training material is a challenging task. In this work, we investigate statistical machine translation of lecture subtitles from English into Croatian, which is morphologically rich and generally weakly supported, especially for the educational domain. We show that results comparable with publicly available systems trained on much larger data can be achieved if a small in-domain training set is used in combination with additional in-domain corpus originating from the closely related Serbian language.

2015

pdf bib
Identifying main obstacles for statistical machine translation of morphologically rich South Slavic languages
Maja Popovic | Mihael Arcan
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Poor man’s lemmatisation for automatic error classification
Maja Popovic | Mihael Arcan | Eleftherios Avramidis | Aljoscha Burchardt
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
DFKI’s experimental hybrid MT system for WMT 2015
Eleftherios Avramidis | Maja Popović | Aljoscha Burchardt
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
chrF: character n-gram F-score for automatic MT evaluation
Maja Popović
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Identifying main obstacles for statistical machine translation of morphologically rich South Slavic languages
Maja Popović | Mihael Arčan
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Poor man’s lemmatisation for automatic error classification
Maja Popović | Mihael Arčan | Eleftherios Avramidis | Aljoscha Burchardt | Arle Lommel
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Towards Deeper MT - A Hybrid System for German
Eleftherios Avramidis | Aljoscha Burchardt | Maja Popović | Hans Uszkoreit
Proceedings of the 1st Deep Machine Translation Workshop

2014

pdf bib
Exploring cross-language statistical machine translation for closely related South Slavic languages
Maja Popović | Nikola Ljubešić
Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants

pdf bib
Correlating decoding events with errors in Statistical Machine Translation
Eleftherios Avramidis | Maja Popović
Proceedings of the 11th International Conference on Natural Language Processing

pdf bib
Using a new analytic measure for the annotation and analysis of MT errors on real data
Arle Lommel | Aljoscha Burchardt | Maja Popović | Kim Harris | Eleftherios Avramidis | Hans Uszkoreit
Proceedings of the 17th Annual conference of the European Association for Machine Translation

pdf bib
Relations between different types of post-editing operations, cognitive effort and temporal effort
Maja Popović | Arle Lommel | Aljoscha Burchardt | Eleftherios Avramidis | Hans Uszkoreit
Proceedings of the 17th Annual conference of the European Association for Machine Translation

pdf bib
The tara corpus of human-annotated machine translations
Eleftherios Avramidis | Aljoscha Burchardt | Sabine Hunsicker | Maja Popović | Cindy Tscherwinka | David Vilar | Hans Uszkoreit
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Human translators are the key to evaluating machine translation (MT) quality and also to addressing the so far unanswered question when and how to use MT in professional translation workflows. This paper describes the corpus developed as a result of a detailed large scale human evaluation consisting of three tightly connected tasks: ranking, error classification and post-editing.

2013

pdf bib
Selecting Feature Sets for Comparative and Time-Oriented Quality Estimation of Machine Translation Output
Eleftherios Avramidis | Maja Popović
Proceedings of the Eighth Workshop on Statistical Machine Translation

2012

pdf bib
Involving Language Professionals in the Evaluation of Machine Translation
Eleftherios Avramidis | Aljoscha Burchardt | Christian Federmann | Maja Popović | Cindy Tscherwinka | David Vilar
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Significant breakthroughs in machine translation only seem possible if human translators are taken into the loop. While automatic evaluation and scoring mechanisms such as BLEU have enabled the fast development of systems, it is not clear how systems can meet real-world (quality) requirements in industrial translation scenarios today. The taraXÜ project paves the way for wide usage of hybrid machine translation outputs through various feedback loops in system development. In a consortium of research and industry partners, the project integrates human translators into the development process for rating and post-editing of machine translation outputs thus collecting feedback for possible improvements.

pdf bib
Automatic MT Error Analysis: Hjerson Helping Addicter
Jan Berka | Ondřej Bojar | Mark Fishel | Maja Popović | Daniel Zeman
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present a complex, open source tool for detailed machine translation error analysis providing the user with automatic error detection and classification, several monolingual alignment algorithms as well as with training and test corpus browsing. The tool is the result of a merge of automatic error detection and classification of Hjerson (Popović, 2011) and Addicter (Zeman et al., 2011) into the pipeline and web visualization of Addicter. It classifies errors into categories similar to those of Vilar et al. (2006), such as: morphological, reordering, missing words, extra words and lexical errors. The graphical user interface shows alignments in both training corpus and test data; the different classes of errors are colored. Also, the summary of errors can be displayed to provide an overall view of the MT system's weaknesses. The tool was developed in Linux, but it was tested on Windows too.

pdf bib
Terra: a Collection of Translation Error-Annotated Corpora
Mark Fishel | Ondřej Bojar | Maja Popović
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Recently the first methods of automatic diagnostics of machine translation have emerged; since this area of research is relatively young, the efforts are not coordinated. We present a collection of translation error-annotated corpora, consisting of automatically produced translations and their detailed manual translation error analysis. Using the collected corpora we evaluate the available state-of-the-art methods of MT diagnostics and assess, how well the methods perform, how they compare to each other and whether they can be useful in practice.

pdf bib
TerrorCat: a Translation Error Categorization-based MT Quality Metric
Mark Fishel | Rico Sennrich | Maja Popović | Ondřej Bojar
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Class error rates for evaluation of machine translation output
Maja Popović
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Morpheme- and POS-based IBM1 and language model scores for translation quality estimation
Maja Popović
Proceedings of the Seventh Workshop on Statistical Machine Translation

2011

pdf bib
Evaluate with Confidence Estimation: Machine ranking of translation outputs using grammatical features
Eleftherios Avramidis | Maja Popovic | David Vilar | Aljoscha Burchardt
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Evaluation without references: IBM1 scores as evaluation metrics
Maja Popović | David Vilar | Eleftherios Avramidis | Aljoscha Burchardt
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Morphemes and POS tags for n-gram based evaluation metrics
Maja Popović
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
From Human to Automatic Error Classification for Machine Translation Output
Maja Popović | Aljoscha Burchardt
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
Towards Automatic Error Analysis of Machine Translation Output
Maja Popović | Hermann Ney
Computational Linguistics, Volume 37, Issue 4 - December 2011

2009

pdf bib
Syntax-Oriented Evaluation Measures for Machine Translation Output
Maja Popović | Hermann Ney
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
The RWTH Machine Translation System for WMT 2009
Maja Popović | David Vilar | Daniel Stein | Evgeny Matusov | Hermann Ney
Proceedings of the Fourth Workshop on Statistical Machine Translation

2007

pdf bib
Word Error Rates: Decomposition over POS classes and Applications for Error Analysis
Maja Popović | Hermann Ney
Proceedings of the Second Workshop on Statistical Machine Translation

2006

pdf bib
POS-based Word Reorderings for Statistical Machine Translation
Maja Popović | Hermann Ney
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Translation In this work we investigate new possibilities for improving the quality of statistical machine translation (SMT) by applying word reorderings of the source language sentences based on Part-of-Speech tags. Results are presented on the European Parliament corpus containing about 700k sentences and 15M running words. In order to investigate sparse training data scenarios, we also report results obtained on about 1\% of the original corpus. The source languages are Spanish and English and target languages are Spanish, English and German. We propose two types of reorderings depending on the language pair and the translation direction: local reorderings of nouns and adjectives for translation from and into Spanish and long-range reorderings of verbs for translation into German. For our best translation system, we achieve up to 2\% relative reduction of WER and up to 7\% relative increase of BLEU score. Improvements can be seen both on the reordered sentences as well as on the rest of the test corpus. Local reorderings are especially important for the translation systems trained on the small corpus whereas long-range reorderings are more effective for the larger corpus.

pdf bib
Morpho-syntactic Information for Automatic Error Analysis of Statistical Machine Translation Output
Maja Popović | Adrià de Gispert | Deepa Gupta | Patrik Lambert | Hermann Ney | José B. Mariño | Marcello Federico | Rafael Banchs
Proceedings on the Workshop on Statistical Machine Translation

2005

pdf bib
Exploiting phrasal lexica and additional morpho-syntactic language resources for statistical machine translation with scarce training data
Maja Popovic | Hermann Ney
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

pdf bib
Augmenting a Small Parallel Text with Morpho-Syntactic Language
Maja Popović | David Vilar | Hermann Ney | Slobodan Jovičić | Zoran Šarić
Proceedings of the ACL Workshop on Building and Using Parallel Texts

2004

pdf bib
Improving Word Alignment Quality using Morpho-syntactic Information
Hermann Ney | Maja Popovic
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Towards the Use of Word Stems and Suffixes for Statistical Machine Translation
Maja Popović | Hermann Ney
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Error Measures and Bayes Decision Rules Revisited with Applications to POS Tagging
Hermann Ney | Maja Popović | David Sündermann
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing