Lieve Macken


2020

pdf bib
Literary Machine Translation under the Magnifying Glass: Assessing the Quality of an NMT-Translated Detective Novel on Document Level
Margot Fonteyne | Arda Tezcan | Lieve Macken
Proceedings of the 12th Language Resources and Evaluation Conference

Several studies (covering many language pairs and translation tasks) have demonstrated that translation quality has improved enormously since the emergence of neural machine translation systems. This raises the question whether such systems are able to produce high-quality translations for more creative text types such as literature and whether they are able to generate coherent translations on document level. Our study aimed to investigate these two questions by carrying out a document-level evaluation of the raw NMT output of an entire novel. We translated Agatha Christie’s novel The Mysterious Affair at Styles with Google’s NMT system from English into Dutch and annotated it in two steps: first all fluency errors, then all accuracy errors. We report on the overall quality, determine the remaining issues, compare the most frequent error types to those in general-domain MT, and investigate whether any accuracy and fluency errors co-occur regularly. Additionally, we assess the inter-annotator agreement on the first chapter of the novel.

pdf bib
Assessing the Comprehensibility of Automatic Translations (ArisToCAT)
Lieve Macken | Margot Fonteyne | Arda Tezcan | Joke Daems
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

The ArisToCAT project aims to assess the comprehensibility of ‘raw’ (unedited) MT output for readers who can only rely on the MT output. In this project description, we summarize the main results of the project and present future work.

2019

pdf bib
Modelling word translation entropy and syntactic equivalence with machine learning
Bram Vanroy | Orphée De Clercq | Lieve Macken
Proceedings of the Second MEMENTO workshop on Modelling Parameters of Cognitive Effort in Translation Production

pdf bib
When a ‘sport’ is a person and other issues for NMT of novels
Arda Tezcan | Joke Daems | Lieve Macken
Proceedings of the Qualities of Literary Machine Translation

2018

pdf bib
A fine-grained error analysis of NMT, SMT and RBMT output for English-to-Dutch
Laura Van Brussel | Arda Tezcan | Lieve Macken
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
UGENT-LT3 SCATE Submission for WMT16 Shared Task on Quality Estimation
Arda Tezcan | Véronique Hoste | Lieve Macken
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Detecting Grammatical Errors in Machine Translation Output Using Dependency Parsing and Treebank Querying
Arda Tezcan | Veronique Hoste | Lieve Macken
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

2015

pdf bib
Smart Computer Aided Translation Environment
Vincent Vandeghinste | Tom Vanallemeersch | Frank Van Eynde | Geert Heyman | Sien Moens | Joris Pelemans | Patrick Wambacq | Iulianna Van der Lek - Ciudin | Arda Tezcan | Lieve Macken | Véronique Hoste | Eva Geurts | Mieke Haesen
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
UGENT-LT3 SCATE System for Machine Translation Quality Estimation
Arda Tezcan | Veronique Hoste | Bart Desmet | Lieve Macken
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Smart Computer Aided Translation Environment - SCATE
Vincent Vandeghinste | Tom Vanallemeersch | Frank Van Eynde | Geert Heyman | Sien Moens | Joris Pelemans | Patrick Wambacq | Iulianna Van der Lek - Ciudin | Arda Tezcan | Lieve Macken | Véronique Hoste | Eva Geurts | Mieke Haesen
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf bib
On the origin of errors: A fine-grained analysis of MT and PE errors and their relationship
Joke Daems | Lieve Macken | Sonia Vandepitte
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In order to improve the symbiosis between machine translation (MT) system and post-editor, it is not enough to know that the output of one system is better than the output of another system. A fine-grained error analysis is needed to provide information on the type and location of errors occurring in MT and the corresponding errors occurring after post-editing (PE). This article reports on a fine-grained translation quality assessment approach which was applied to machine translated-texts and the post-edited versions of these texts, made by student post-editors. By linking each error to the corresponding source text-passage, it is possible to identify passages that were problematic in MT, but not after PE, or passages that were problematic even after PE. This method provides rich data on the origin and impact of errors, which can be used to improve post-editor training as well as machine translation systems. We present the results of a pilot experiment on the post-editing of newspaper articles and highlight the advantages of our approach.

2012

pdf bib
From keystrokes to annotated process data: Enriching the output of Inputlog with linguistic information
Lieve Macken | Veronique Hoste | Mariëlle Leijten | Luuk Van Waes
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Keystroke logging tools are a valuable aid to monitor written language production. These tools record all keystrokes, including backspaces and deletions together with timing information. In this paper we report on an extension to the keystroke logging program Inputlog in which we aggregate the logged process data from the keystroke (character) level to the word level. The logged process data are further enriched with different kinds of linguistic information: part-of-speech tags, lemmata, chunk boundaries, syllable boundaries and word frequency. A dedicated parser has been developed that distils from the logged process data word-level revisions, deleted fragments and final product data. The linguistically-annotated output will facilitate the linguistic analysis of the logged data and will provide a valuable basis for more linguistically-oriented writing process research. The set-up of the extension to Inputlog is largely language-independent. As proof-of-concept, the extension has been developed for English and Dutch. Inputlog is freely available for research purposes.

pdf bib
From Character to Word Level: Enabling the Linguistic Analyses of Inputlog Process Data
Mariëlle Leijten | Lieve Macken | Veronique Hoste | Eric Van Horenbeeck | Luuk Van Waes
Proceedings of the Second Workshop on Computational Linguistics and Writing (CL&W 2012): Linguistic and Cognitive Aspects of Document Creation and Document Engineering

2010

pdf bib
An Annotation Scheme and Gold Standard for Dutch-English Word Alignment
Lieve Macken
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The importance of sentence-aligned parallel corpora has been widely acknowledged. Reference corpora in which sub-sentential translational correspondences are indicated manually are more labour-intensive to create, and hence less wide-spread. Such manually created reference alignments -- also called Gold Standards -- have been used in research projects to develop or test automatic word alignment systems. In most translations, translational correspondences are rather complex; for example word-by-word correspondences can be found only for a limited number of words. A reference corpus in which those complex translational correspondences are aligned manually is therefore also a useful resource for the development of translation tools and for translation studies. In this paper, we describe how we created a Gold Standard for the Dutch-English language pair. We present the annotation scheme, annotation guidelines, annotation tool and inter-annotator results. To cover a wide range of syntactic and stylistic phenomena that emerge from different writing and translation styles, our Gold Standard data set contains texts from different text types. The Gold Standard will be publicly available as part of the Dutch Parallel Corpus.

2009

pdf bib
Language-Independent Bilingual Terminology Extraction from a Multilingual Parallel Corpus
Els Lefever | Lieve Macken | Veronique Hoste
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2008

pdf bib
Sentence Alignment in DPC: Maximizing Precision, Minimizing Human Effort
Julia Trushkina | Lieve Macken | Hans Paulussen
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

A wide spectrum of multilingual applications have aligned parallel corpora as their prerequisite. The aim of the project described in this paper is to build a multilingual corpus where all sentences are aligned at very high precision with a minimal human effort involved. The experiments on a combination of sentence aligners with different underlying algorithms described in this paper showed that by verifying only those links which were not recognized by at least two aligners, an error rate can be reduced by 93.76% as compared to the performance of the best aligner. Such manual involvement concerned only a small portion of all data (6%). This significantly reduces a load of manual work necessary to achieve nearly 100% accuracy of alignment.

pdf bib
Linguistically-Based Sub-Sentential Alignment for Terminology Extraction from a Bilingual Automotive Corpus
Lieve Macken | Els Lefever | Veronique Hoste
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)