Johanna Gerlach


pdf bib
Constructing Multimodal Language Learner Texts Using LARA: Experiences with Nine Languages
Elham Akhlaghi | Branislav Bédi | Fatih Bektaş | Harald Berthelsen | Matthias Butterweck | Cathy Chua | Catia Cucchiarin | Gülşen Eryiğit | Johanna Gerlach | Hanieh Habibi | Neasa Ní Chiaráin | Manny Rayner | Steinþór Steingrímsson | Helmer Strik
Proceedings of the 12th Language Resources and Evaluation Conference

LARA (Learning and Reading Assistant) is an open source platform whose purpose is to support easy conversion of plain texts into multimodal online versions suitable for use by language learners. This involves semi-automatically tagging the text, adding other annotations and recording audio. The platform is suitable for creating texts in multiple languages via crowdsourcing techniques that can be used for teaching a language via reading and listening. We present results of initial experiments by various collaborators where we measure the time required to produce substantial LARA resources, up to the length of short novels, in Dutch, English, Farsi, French, German, Icelandic, Irish, Swedish and Turkish. The first results are encouraging. Although there are some startup problems, the conversion task seems manageable for the languages tested so far. The resulting enriched texts are posted online and are freely available in both source and compiled form.

COPECO: a Collaborative Post-Editing Corpus in Pedagogical Context
Jonathan Mutal | Pierrette Bouillon | Perrine Schumacher | Johanna Gerlach
Proceedings of 1st Workshop on Post-Editing in Modern-Day Translation

pdf bib
Ellipsis Translation for a Medical Speech to Speech Translation System
Jonathan Mutal | Johanna Gerlach | Pierrette Bouillon | Hervé Spechbach
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

In diagnostic interviews, elliptical utterances allow doctors to question patients in a more efficient and economical way. However, literal translation of such incomplete utterances is rarely possible without affecting communication. Previous studies have focused on automatic ellipsis detection and resolution, but only few specifically address the problem of automatic translation of ellipsis. In this work, we evaluate four different approaches to translate ellipsis in medical dialogues in the context of the speech to speech translation system BabelDr. We also investigate the impact of training data, using an under-sampling method and data with elliptical utterances in context. Results show that the best model is able to translate 88% of elliptical utterances.


pdf bib
Monolingual backtranslation in a medical speech translation system for diagnostic interviews - a NMT approach
Jonathan Mutal | Pierrette Bouillon | Johanna Gerlach | Paula Estrella | Hervé Spechbach
Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks


pdf bib
A Shared Task for Spoken CALL?
Claudia Baur | Johanna Gerlach | Manny Rayner | Martin Russell | Helmer Strik
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We argue that the field of spoken CALL needs a shared task in order to facilitate comparisons between different groups and methodologies, and describe a concrete example of such a task, based on data collected from a speech-enabled online tool which has been used to help young Swiss German teens practise skills in English conversation. Items are prompt-response pairs, where the prompt is a piece of German text and the response is a recorded English audio file. The task is to label pairs as “accept” or “reject”, accepting responses which are grammatically and linguistically correct to match a set of hidden gold standard answers as closely as possible. Initial resources are provided so that a scratch system can be constructed with a minimal investment of effort, and in particular without necessarily using a speech recogniser. Training data for the task will be released in June 2016, and test data in January 2017.

pdf bib
An Open Web Platform for Rule-Based Speech-to-Sign Translation
Manny Rayner | Pierrette Bouillon | Sarah Ebling | Johanna Gerlach | Irene Strasly | Nikos Tsourakis
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)


pdf bib
The ACCEPT Academic Portal: Bringing Together Pre-editing, MT and Post-editing into a Learning Environment
Pierrette Bouillon | Johanna Gerlach | Asheesh Gulati | Victoria Porro | Violeta Seretan
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
The ACCEPT Academic Portal: Bringing Together Pre-editing, MT and Post-editing into a Learning Environment
Pierrette Bouillon | Johanna Gerlach | Asheesh Gulati | Victoria Porro | Violeta Seretan
Proceedings of the 18th Annual Conference of the European Association for Machine Translation


pdf bib
A Large-Scale Evaluation of Pre-editing Strategies for Improving User-Generated Content Translation
Violeta Seretan | Pierrette Bouillon | Johanna Gerlach
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The user-generated content represents an increasing share of the information available today. To make this type of content instantly accessible in another language, the ACCEPT project focuses on developing pre-editing technologies for correcting the source text in order to increase its translatability. Linguistically-informed pre-editing rules have been developed for English and French for the two domains considered by the project, namely, the technical domain and the healthcare domain. In this paper, we present the evaluation experiments carried out to assess the impact of the proposed pre-editing rules on translation quality. Results from a large-scale evaluation campaign show that pre-editing helps indeed attain a better translation quality for a high proportion of the data, the difference with the number of cases where the adverse effect is observed being statistically significant. The ACCEPT pre-editing technology is freely available online and can be used in any Web-based environment to enhance the translatability of user-generated content so that it reaches a broader audience.


pdf bib
Two Approaches to Correcting Homophone Confusions in a Hybrid Machine Translation System
Pierrette Bouillon | Johanna Gerlach | Ulrich Germann | Barry Haddow | Manny Rayner
Proceedings of the Second Workshop on Hybrid Approaches to Translation

pdf bib
Can lightweight pre-editing rules improve statistical MT of forum content? (La La préédition avec des règles peu coûteuses, utile pour la TA statistique des forums ?) [in French]
Johanna Gerlach | Victoria Porro | Pierrette Bouillon | Sabine Lehmann
Proceedings of TALN 2013 (Volume 2: Short Papers)


pdf bib
Evaluating Appropriateness Of System Responses In A Spoken CALL Game
Manny Rayner | Pierrette Bouillon | Johanna Gerlach
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We describe an experiment carried out using a French version of CALL-SLT, a web-enabled CALL game in which students at each turn are prompted to give a semi-free spoken response which the system then either accepts or rejects. The central question we investigate is whether the response is appropriate; we do this by extracting pairs of utterances where both members of the pair are responses by the same student to the same prompt, and where one response is accepted and one rejected. When the two spoken responses are presented in random order, native speakers show a reasonable degree of agreement in judging that the accepted utterance is better than the rejected one. We discuss the significance of the results and also present a small study supporting the claim that native speakers are nearly always recognised by the system, while non-native speakers are rejected a significant proportion of the time.


pdf bib
A Multilingual CALL Game Based on Speech Translation
Manny Rayner | Pierrette Bouillon | Nikos Tsourakis | Johanna Gerlach | Maria Georgescul | Yukie Nakao | Claudia Baur
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We describe a multilingual Open Source CALL game, CALL-SLT, which reuses speech translation technology developed using the Regulus platform to create an automatic conversation partner that allows intermediate-level language students to improve their fluency. We contrast CALL-SLT with Wang's and Seneff's ``translation game'' system, in particular focussing on three issues. First, we argue that the grammar-based recognition architecture offered by Regulus is more suitable for this type of application; second, that it is preferable to prompt the student in a language-neutral form, rather than in the L1; and third, that we can profitably record successful interactions by native speakers and store them to be reused as online help for students. The current system, which will be demoed at the conference, supports four L2s (English, French, Japanese and Swedish) and two L1s (English and French). We conclude by describing an evaluation exercise, where a version of CALL-SLT configured for English L2 and French L1 was used by several hundred high school students. About half of the subjects reported positive impressions of the system.