Judith Gaspers


2020

pdf bib
To What Degree Can Language Borders Be Blurred In BERT-based Multilingual Spoken Language Understanding?
Quynh Do | Judith Gaspers | Tobias Roeding | Melanie Bradford
Proceedings of the 28th International Conference on Computational Linguistics

This paper addresses the question as to what degree a BERT-based multilingual Spoken Language Understanding (SLU) model can transfer knowledge across languages. Through experiments we will show that, although it works substantially well even on distant language groups, there is still a gap to the ideal multilingual performance. In addition, we propose a novel BERT-based adversarial model architecture to learn language-shared and language-specific representations for multilingual SLU. Our experimental results prove that the proposed model is capable of narrowing the gap to the ideal multilingual performance.

2019

pdf bib
Cross-lingual Transfer Learning with Data Selection for Large-Scale Spoken Language Understanding
Quynh Do | Judith Gaspers
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

A typical cross-lingual transfer learning approach boosting model performance on a language is to pre-train the model on all available supervised data from another language. However, in large-scale systems this leads to high training times and computational requirements. In addition, characteristic differences between the source and target languages raise a natural question of whether source data selection can improve the knowledge transfer. In this paper, we address this question and propose a simple but effective language model based source-language data selection method for cross-lingual transfer learning in large-scale spoken language understanding. The experimental results show that with data selection i) source data and hence training speed is reduced significantly and ii) model performance is improved.

pdf bib
Cross-lingual Transfer Learning for Japanese Named Entity Recognition
Andrew Johnson | Penny Karanasou | Judith Gaspers | Dietrich Klakow
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)

This work explores cross-lingual transfer learning (TL) for named entity recognition, focusing on bootstrapping Japanese from English. A deep neural network model is adopted and the best combination of weights to transfer is extensively investigated. Moreover, a novel approach is presented that overcomes linguistic differences between this language pair by romanizing a portion of the Japanese input. Experiments are conducted on external datasets, as well as internal large-scale real-world ones. Gains with TL are achieved for all evaluated cases. Finally, the influence on TL of the target dataset size and of the target tagset distribution is further investigated.

2018

pdf bib
Selecting Machine-Translated Data for Quick Bootstrapping of a Natural Language Understanding System
Judith Gaspers | Penny Karanasou | Rajen Chatterjee
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

This paper investigates the use of Machine Translation (MT) to bootstrap a Natural Language Understanding (NLU) system for a new language for the use case of a large-scale voice-controlled device. The goal is to decrease the cost and time needed to get an annotated corpus for the new language, while still having a large enough coverage of user requests. Different methods of filtering MT data in order to keep utterances that improve NLU performance and language-specific post-processing methods are investigated. These methods are tested in a large-scale NLU task with translating around 10 millions training utterances from English to German. The results show a large improvement for using MT data over a grammar-based and over an in-house data collection baseline, while reducing the manual effort greatly. Both filtering and post-processing approaches improve results further.

2015

pdf bib
Semantic parsing of speech using grammars learned with weak supervision
Judith Gaspers | Philipp Cimiano | Britta Wrede
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
A multimodal corpus for the evaluation of computational models for (grounded) language acquisition
Judith Gaspers | Maximilian Panzner | Andre Lemme | Philipp Cimiano | Katharina J. Rohlfing | Sebastian Wrede
Proceedings of the 5th Workshop on Cognitive Aspects of Computational Language Learning (CogACLL)