Eunah Cho


2019

pdf bib
Graph-Based Semi-Supervised Learning for Natural Language Understanding
Zimeng Qiu | Eunah Cho | Xiaochun Ma | William Campbell
Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)

Semi-supervised learning is an efficient method to augment training data automatically from unlabeled data. Development of many natural language understanding (NLU) applications has a challenge where unlabeled data is relatively abundant while labeled data is rather limited. In this work, we propose transductive graph-based semi-supervised learning models as well as their inductive variants for NLU. We evaluate the approach’s applicability using publicly available NLU data and models. In order to find similar utterances and construct a graph, we use a paraphrase detection model. Results show that applying the inductive graph-based semi-supervised learning can improve the error rate of the NLU model by 5%.

pdf bib
Paraphrase Generation for Semi-Supervised Learning in NLU
Eunah Cho | He Xie | William M. Campbell
Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation

Semi-supervised learning is an efficient way to improve performance for natural language processing systems. In this work, we propose Para-SSL, a scheme to generate candidate utterances using paraphrasing and methods from semi-supervised learning. In order to perform paraphrase generation in the context of a dialog system, we automatically extract paraphrase pairs to create a paraphrase corpus. Using this data, we build a paraphrase generation system and perform one-to-many generation, followed by a validation step to select only the utterances with good quality. The paraphrase-based semi-supervised learning is applied to five functionalities in a natural language understanding system. Our proposed method for semi-supervised learning using paraphrase generation does not require user utterances and can be applied prior to releasing a new functionality to a system. Experiments show that we can achieve up to 19% of relative slot error reduction without an access to user utterances, and up to 35% when leveraging live traffic utterances.

2017

pdf bib
Analyzing Neural MT Search and Model Performance
Jan Niehues | Eunah Cho | Thanh-Le Ha | Alex Waibel
Proceedings of the First Workshop on Neural Machine Translation

In this paper, we offer an in-depth analysis about the modeling and search performance. We address the question if a more complex search algorithm is necessary. Furthermore, we investigate the question if more complex models which might only be applicable during rescoring are promising. By separating the search space and the modeling using n-best list reranking, we analyze the influence of both parts of an NMT system independently. By comparing differently performing NMT systems, we show that the better translation is already in the search space of the translation systems with less performance. This results indicate that the current search algorithms are sufficient for the NMT systems. Furthermore, we could show that even a relatively small n-best list of 50 hypotheses already contain notably better translations.

pdf bib
Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning
Jan Niehues | Eunah Cho
Proceedings of the Second Conference on Machine Translation

pdf bib
The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2017
Ngoc-Quan Pham | Jan Niehues | Thanh-Le Ha | Eunah Cho | Matthias Sperber | Alexander Waibel
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
Lecture Translator - Speech translation framework for simultaneous lecture translation
Markus Müller | Thai Son Nguyen | Jan Niehues | Eunah Cho | Bastian Krüger | Thanh-Le Ha | Kevin Kilgour | Matthias Sperber | Mohammed Mediani | Sebastian Stüker | Alex Waibel
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf bib
Pre-Translation for Neural Machine Translation
Jan Niehues | Eunah Cho | Thanh-Le Ha | Alex Waibel
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Recently, the development of neural machine translation (NMT) has significantly improved the translation quality of automatic machine translation. While most sentences are more accurate and fluent than translations by statistical machine translation (SMT)-based systems, in some cases, the NMT system produces translations that have a completely different meaning. This is especially the case when rare words occur. When using statistical machine translation, it has already been shown that significant gains can be achieved by simplifying the input in a preprocessing step. A commonly used example is the pre-reordering approach. In this work, we used phrase-based machine translation to pre-translate the input into the target language. Then a neural machine translation system generates the final hypothesis using the pre-translation. Thereby, we use either only the output of the phrase-based machine translation (PBMT) system or a combination of the PBMT output and the source sentence. We evaluate the technique on the English to German translation task. Using this approach we are able to outperform the PBMT system as well as the baseline neural MT system by up to 2 BLEU points. We analyzed the influence of the quality of the initial system on the final result.

pdf bib
Using Factored Word Representation in Neural Network Language Models
Jan Niehues | Thanh-Le Ha | Eunah Cho | Alex Waibel
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

pdf bib
The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2016
Thanh-Le Ha | Eunah Cho | Jan Niehues | Mohammed Mediani | Matthias Sperber | Alexandre Allauzen | Alexander Waibel
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf bib
The Karlsruhe Institute of Technology Translation Systems for the WMT 2015
Eunah Cho | Thanh-Le Ha | Jan Niehues | Teresa Herrmann | Mohammed Mediani | Yuqi Zhang | Alex Waibel
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
The KIT-LIMSI Translation System for WMT 2015
Thanh-Le Ha | Quoc-Khanh Do | Eunah Cho | Jan Niehues | Alexandre Allauzen | François Yvon | Alex Waibel
Proceedings of the Tenth Workshop on Statistical Machine Translation

2014

pdf bib
Tight Integration of Speech Disfluency Removal into SMT
Eunah Cho | Jan Niehues | Alex Waibel
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
EU-BRIDGE MT: Combined Machine Translation
Markus Freitag | Stephan Peitz | Joern Wuebker | Hermann Ney | Matthias Huck | Rico Sennrich | Nadir Durrani | Maria Nadejde | Philip Williams | Philipp Koehn | Teresa Herrmann | Eunah Cho | Alex Waibel
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
The Karlsruhe Institute of Technology Translation Systems for the WMT 2014
Teresa Herrmann | Mohammed Mediani | Eunah Cho | Thanh-Le Ha | Jan Niehues | Isabel Slawik | Yuqi Zhang | Alex Waibel
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
A Corpus of Spontaneous Speech in Lectures: The KIT Lecture Corpus for Spoken Language Processing and Translation
Eunah Cho | Sarah Fünfer | Sebastian Stüker | Alex Waibel
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

With the increasing number of applications handling spontaneous speech, the needs to process spoken languages become stronger. Speech disfluency is one of the most challenging tasks to deal with in automatic speech processing. As most applications are trained with well-formed, written texts, many issues arise when processing spontaneous speech due to its distinctive characteristics. Therefore, more data with annotated speech disfluencies will help the adaptation of natural language processing applications, such as machine translation systems. In order to support this, we have annotated speech disfluencies in German lectures at KIT. In this paper we describe how we annotated the disfluencies in the data and provide detailed statistics on the size of the corpus and the speakers. Moreover, machine translation performance on a source text including disfluencies is compared to the results of the translation of a source text without different sorts of disfluencies or no disfluencies at all.

2013

pdf bib
The Karlsruhe Institute of Technology Translation Systems for the WMT 2013
Eunah Cho | Thanh-Le Ha | Mohammed Mediani | Jan Niehues | Teresa Herrmann | Isabel Slawik | Alex Waibel
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Joint WMT 2013 Submission of the QUAERO Project
Stephan Peitz | Saab Mansour | Matthias Huck | Markus Freitag | Hermann Ney | Eunah Cho | Teresa Herrmann | Mohammed Mediani | Jan Niehues | Alex Waibel | Alexander Allauzen | Quoc Khanh Do | Bianka Buschbeck | Tonio Wandmacher
Proceedings of the Eighth Workshop on Statistical Machine Translation

2012

pdf bib
The KIT Lecture Corpus for Speech Translation
Sebastian Stüker | Florian Kraft | Christian Mohr | Teresa Herrmann | Eunah Cho | Alex Waibel
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Academic lectures offer valuable content, but often do not reach their full potential audience due to the language barrier. Human translations of lectures are too expensive to be widely used. Speech translation technology can be an affordable alternative in this case. State-of-the-art speech translation systems utilize statistical models that need to be trained on large amounts of in-domain data. In order to support the KIT lecture translation project in its effort to introduce speech translation technology in KIT's lecture halls, we have collected a corpus of German lectures at KIT. In this paper we describe how we recorded the lectures and how we annotated them. We further give detailed statistics on the types of lectures in the corpus and its size. We collected the corpus with the purpose in mind that it should not just be suited for training a spoken language translation system the traditional way, but should also enable us to research techniques that enable the translation system to automatically and autonomously adapt itself to the varying topics and speakers of lectures

pdf bib
The Karlsruhe Institute of Technology Translation Systems for the WMT 2012
Jan Niehues | Yuqi Zhang | Mohammed Mediani | Teresa Herrmann | Eunah Cho | Alex Waibel
Proceedings of the Seventh Workshop on Statistical Machine Translation