Ngoc Thang Vu


pdf bib
Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension
Ekta Sood | Simon Tannert | Diego Frassinelli | Andreas Bulling | Ngoc Thang Vu
Proceedings of the 24th Conference on Computational Natural Language Learning

While neural networks with attention mechanisms have achieved superior performance on many natural language processing tasks, it remains unclear to which extent learned attention resembles human visual attention. In this paper, we propose a new method that leverages eye-tracking data to investigate the relationship between human visual attention and neural attention in machine reading comprehension. To this end, we introduce a novel 23 participant eye tracking dataset - MQA-RC, in which participants read movie plots and answered pre-defined questions. We compare state of the art networks based on long short-term memory (LSTM), convolutional neural models (CNN) and XLNet Transformer architectures. We find that higher similarity to human attention and performance significantly correlates to the LSTM and CNN models. However, we show this relationship does not hold true for the XLNet models – despite the fact that the XLNet performs best on this challenging task. Our results suggest that different architectures seem to learn rather different neural attention strategies and similarity of neural to human attention does not guarantee best performance.

pdf bib
Fast and Accurate Non-Projective Dependency Tree Linearization
Xiang Yu | Simon Tannert | Ngoc Thang Vu | Jonas Kuhn
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose a graph-based method to tackle the dependency tree linearization task. We formulate the task as a Traveling Salesman Problem (TSP), and use a biaffine attention model to calculate the edge costs. We facilitate the decoding by solving the TSP for each subtree and combining the solution into a projective tree. We then design a transition system as post-processing, inspired by non-projective transition-based parsing, to obtain non-projective sentences. Our proposed method outperforms the state-of-the-art linearizer while being 10 times faster in training and decoding.

pdf bib
ADVISER: A Toolkit for Developing Multi-modal, Multi-domain and Socially-engaged Conversational Agents
Chia-Yu Li | Daniel Ortega | Dirk Väth | Florian Lux | Lindsey Vanderlyn | Maximilian Schmidt | Michael Neumann | Moritz Völkel | Pavel Denisov | Sabrina Jenne | Zorica Kacarevic | Ngoc Thang Vu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present ADVISER - an open-source, multi-domain dialog system toolkit that enables the development of multi-modal (incorporating speech, text and vision), socially-engaged (e.g. emotion recognition, engagement level prediction and backchanneling) conversational agents. The final Python-based implementation of our toolkit is flexible, easy to use, and easy to extend not only for technically experienced users, such as machine learning researchers, but also for less technically experienced users, such as linguists or cognitive scientists, thereby providing a flexible platform for collaborative research.

pdf bib
Cairo Student Code-Switch (CSCS) Corpus: An Annotated Egyptian Arabic-English Corpus
Mohamed Balabel | Injy Hamed | Slim Abdennadher | Ngoc Thang Vu | Özlem Çetinoğlu
Proceedings of the 12th Language Resources and Evaluation Conference

Code-switching has become a prevalent phenomenon across many communities. It poses a challenge to NLP researchers, mainly due to the lack of available data needed for training and testing applications. In this paper, we introduce a new resource: a corpus of Egyptian- Arabic code-switch speech data that is fully tokenized, lemmatized and annotated for part-of-speech tags. Beside the corpus itself, we provide annotation guidelines to address the unique challenges of annotating code-switch data. Another challenge that we address is the fact that Egyptian Arabic orthography and grammar are not standardized.

pdf bib
ArzEn: A Speech Corpus for Code-switched Egyptian Arabic-English
Injy Hamed | Ngoc Thang Vu | Slim Abdennadher
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we present our ArzEn corpus, an Egyptian Arabic-English code-switching (CS) spontaneous speech corpus. The corpus is collected through informal interviews with 38 Egyptian bilingual university students and employees held in a soundproof room. A total of 12 hours are recorded, transcribed, validated and sentence segmented. The corpus is mainly designed to be used in Automatic Speech Recognition (ASR) systems, however, it also provides a useful resource for analyzing the CS phenomenon from linguistic, sociological, and psychological perspectives. In this paper, we first discuss the CS phenomenon in Egypt and the factors that gave rise to the current language. We then provide a detailed description on how the corpus was collected, giving an overview on the participants involved. We also present statistics on the CS involved in the corpus, as well as a summary to the effort exerted in the corpus development, in terms of number of hours required for transcription, validation, segmentation and speaker annotation. Finally, we discuss some factors contributing to the complexity of the corpus, as well as Arabic-English CS behaviour that could pose potential challenges to ASR systems.

pdf bib
A Two-stage Model for Slot Filling in Low-resource Settings: Domain-agnostic Non-slot Reduction and Pretrained Contextual Embeddings
Cennet Oguz | Ngoc Thang Vu
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing

Learning-based slot filling - a key component of spoken language understanding systems - typically requires a large amount of in-domain hand-labeled data for training. In this paper, we propose a novel two-stage model architecture that can be trained with only a few in-domain hand-labeled examples. The first step is designed to remove non-slot tokens (i.e., O labeled tokens), as they introduce noise in the input of slot filling models. This step is domain-agnostic and therefore, can be trained by exploiting out-of-domain data. The second step identifies slot names only for slot tokens by using state-of-the-art pretrained contextual embeddings such as ELMO and BERT. We show that our approach outperforms other state-of-art systems on the SNIPS benchmark dataset.

pdf bib
Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning
Daniel Grießhaber | Johannes Maucher | Ngoc Thang Vu
Proceedings of the 28th International Conference on Computational Linguistics

Recently, leveraging pre-trained Transformer based language models in down stream, task specific models has advanced state of the art results in natural language understanding tasks. However, only a little research has explored the suitability of this approach in low resource settings with less than 1,000 training data points. In this work, we explore fine-tuning methods of BERT - a pre-trained Transformer based language model - by utilizing pool-based active learning to speed up training while keeping the cost of labeling new data constant. Our experimental results on the GLUE data set show an advantage in model performance by maximizing the approximate knowledge gain of the model when querying from the pool of unlabeled data. Finally, we demonstrate and analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters, making it more suitable for low-resource settings.

pdf bib
F1 is Not Enough! Models and Evaluation Towards User-Centered Explainable Question Answering
Hendrik Schuff | Heike Adel | Ngoc Thang Vu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Explainable question answering systems predict an answer together with an explanation showing why the answer has been selected. The goal is to enable users to assess the correctness of the system and understand its reasoning process. However, we show that current models and evaluation settings have shortcomings regarding the coupling of answer and explanation which might cause serious issues in user experience. As a remedy, we propose a hierarchical model and a new regularization term to strengthen the answer-explanation coupling as well as two evaluation scores to quantify the coupling. We conduct experiments on the HOTPOTQA benchmark data set and perform a user study. The user study shows that our models increase the ability of the users to judge the correctness of the system and that scores like F1 are not enough to estimate the usefulness of a model in a practical setting with human users. Our scores are better aligned with user experience, making them promising candidates for model selection.

pdf bib
Ensemble Self-Training for Low-Resource Languages: Grapheme-to-Phoneme Conversion and Morphological Inflection
Xiang Yu | Ngoc Thang Vu | Jonas Kuhn
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

We present an iterative data augmentation framework, which trains and searches for an optimal ensemble and simultaneously annotates new training data in a self-training style. We apply this framework on two SIGMORPHON 2020 shared tasks: grapheme-to-phoneme conversion and morphological inflection. With very simple base models in the ensemble, we rank the first and the fourth in these two tasks. We show in the analysis that our system works especially well on low-resource languages.

pdf bib
IMSurReal Too: IMS in the Surface Realization Shared Task 2020
Xiang Yu | Simon Tannert | Ngoc Thang Vu | Jonas Kuhn
Proceedings of the Third Workshop on Multilingual Surface Realisation

We introduce the IMS contribution to the Surface Realization Shared Task 2020. The new system achieves substantial improvement over the state-of-the-art system from last year, mainly due to a better token representation and a better linearizer, as well as a simple ensembling approach. We also experiment with data augmentation, which brings some additional performance gain. The system is available at


pdf bib
IMSurReal: IMS at the Surface Realization Shared Task 2019
Xiang Yu | Agnieszka Falenska | Marina Haid | Ngoc Thang Vu | Jonas Kuhn
Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019)

We introduce the IMS contribution to the Surface Realization Shared Task 2019. Our submission achieves the state-of-the-art performance without using any external resources. The system takes a pipeline approach consisting of five steps: linearization, completion, inflection, contraction, and detokenization. We compare the performance of our linearization algorithm with two external baselines and report results for each step in the pipeline. Furthermore, we perform detailed error analysis revealing correlation between word order freedom and difficulty of the linearization task.

pdf bib
ADVISER: A Dialog System Framework for Education & Research
Daniel Ortega | Dirk Väth | Gianna Weber | Lindsey Vanderlyn | Maximilian Schmidt | Moritz Völkel | Zorica Karacevic | Ngoc Thang Vu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

In this paper, we present ADVISER - an open source dialog system framework for education and research purposes. This system supports multi-domain task-oriented conversations in two languages. It additionally provides a flexible architecture in which modules can be arbitrarily combined or exchanged - allowing for easy switching between rules-based and neural network based implementations. Furthermore, ADVISER offers a transparent, user-friendly framework designed for interdisciplinary collaboration: from a flexible back end, allowing easy integration of new features, to an intuitive graphical user interface supporting nontechnical users.

pdf bib
Learning the Dyck Language with Attention-based Seq2Seq Models
Xiang Yu | Ngoc Thang Vu | Jonas Kuhn
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

The generalized Dyck language has been used to analyze the ability of Recurrent Neural Networks (RNNs) to learn context-free grammars (CFGs). Recent studies draw conflicting conclusions on their performance, especially regarding the generalizability of the models with respect to the depth of recursion. In this paper, we revisit several common models and experimental settings, discuss the potential problems of the tasks and analyses. Furthermore, we explore the use of attention mechanisms within the seq2seq framework to learn the Dyck language, which could compensate for the limited encoding ability of RNNs. Our findings reveal that attention mechanisms still cannot truly generalize over the recursion depth, although they perform much better than other models on the closing bracket tagging task. Moreover, this also suggests that this commonly used task is not sufficient to test a model’s understanding of CFGs.

pdf bib
To Combine or Not To Combine? A Rainbow Deep Reinforcement Learning Agent for Dialog Policies
Dirk Väth | Ngoc Thang Vu
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

In this paper, we explore state-of-the-art deep reinforcement learning methods for dialog policy training such as prioritized experience replay, double deep Q-Networks, dueling network architectures and distributional learning. Our main findings show that each individual method improves the rewards and the task success rate but combining these methods in a Rainbow agent, which performs best across tasks and environments, is a non-trivial task. We, therefore, provide insights about the influence of each method on the combination and how to combine them to form a Rainbow agent.

pdf bib
Head-First Linearization with Tree-Structured Representation
Xiang Yu | Agnieszka Falenska | Ngoc Thang Vu | Jonas Kuhn
Proceedings of the 12th International Conference on Natural Language Generation

We present a dependency tree linearization model with two novel components: (1) a tree-structured encoder based on bidirectional Tree-LSTM that propagates information first bottom-up then top-down, which allows each token to access information from the entire tree; and (2) a linguistically motivated head-first decoder that emphasizes the central role of the head and linearizes the subtree by incrementally attaching the dependents on both sides of the head. With the new encoder and decoder, we reach state-of-the-art performance on the Surface Realization Shared Task 2018 dataset, outperforming not only the shared tasks participants, but also previous state-of-the-art systems (Bohnet et al., 2011; Puduppully et al., 2016). Furthermore, we analyze the power of the tree-structured encoder with a probing task and show that it is able to recognize the topological relation between any pair of tokens in a tree.


pdf bib
Comparing Attention-Based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension
Matthias Blohm | Glorianna Jagfeld | Ekta Sood | Xiang Yu | Ngoc Thang Vu
Proceedings of the 22nd Conference on Computational Natural Language Learning

We propose a machine reading comprehension model based on the compare-aggregate framework with two-staged attention that achieves state-of-the-art results on the MovieQA question answering dataset. To investigate the limitations of our model as well as the behavioral difference between convolutional and recurrent neural networks, we generate adversarial examples to confuse the model and compare to human performance. Furthermore, we assess the generalizability of our model by analyzing its differences to human inference, drawing upon insights from cognitive science.

pdf bib
Introducing Two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness
Kim Anh Nguyen | Sabine Schulte im Walde | Ngoc Thang Vu
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We present two novel datasets for the low-resource language Vietnamese to assess models of semantic similarity: ViCon comprises pairs of synonyms and antonyms across word classes, thus offering data to distinguish between similarity and dissimilarity. ViSim-400 provides degrees of similarity across five semantic relations, as rated by human judges. The two datasets are verified through standard co-occurrence and neural network models, showing results comparable to the respective English datasets.

pdf bib
Addressing Low-Resource Scenarios with Character-aware Embeddings
Sean Papay | Sebastian Padó | Ngoc Thang Vu
Proceedings of the Second Workshop on Subword/Character LEvel Models

Most modern approaches to computing word embeddings assume the availability of text corpora with billions of words. In this paper, we explore a setup where only corpora with millions of words are available, and many words in any new text are out of vocabulary. This setup is both of practical interests – modeling the situation for specific domains and low-resource languages – and of psycholinguistic interest, since it corresponds much more closely to the actual experiences and challenges of human language learning and use. We compare standard skip-gram word embeddings with character-based embeddings on word relatedness prediction. Skip-grams excel on large corpora, while character-based embeddings do well on small corpora generally and rare and complex words specifically. The models can be combined easily.

pdf bib
Approximate Dynamic Oracle for Dependency Parsing with Reinforcement Learning
Xiang Yu | Ngoc Thang Vu | Jonas Kuhn
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

We present a general approach with reinforcement learning (RL) to approximate dynamic oracles for transition systems where exact dynamic oracles are difficult to derive. We treat oracle parsing as a reinforcement learning problem, design the reward function inspired by the classical dynamic oracle, and use Deep Q-Learning (DQN) techniques to train the oracle with gold trees as features. The combination of a priori knowledge and data-driven methods enables an efficient dynamic oracle, which improves the parser performance over static oracles in several transition systems.

pdf bib
Sequence-to-Sequence Models for Data-to-Text Natural Language Generation: Word- vs. Character-based Processing and Output Diversity
Glorianna Jagfeld | Sabrina Jenne | Ngoc Thang Vu
Proceedings of the 11th International Conference on Natural Language Generation

We present a comparison of word-based and character-based sequence-to-sequence models for data-to-text natural language generation, which generate natural language descriptions for structured inputs. On the datasets of two recent generation challenges, our models achieve comparable or better automatic evaluation results than the best challenge submissions. Subsequent detailed statistical and human analyses shed light on the differences between the two input representations and the diversity of the generated texts. In a controlled experiment with synthetic training data generated from templates, we demonstrate the ability of neural models to learn novel combinations of the templates and thereby generalize beyond the linguistic structures they were trained on.


pdf bib
Character Composition Model with Convolutional Neural Networks for Dependency Parsing on Morphologically Rich Languages
Xiang Yu | Ngoc Thang Vu
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We present a transition-based dependency parser that uses a convolutional neural network to compose word representations from characters. The character composition model shows great improvement over the word-lookup model, especially for parsing agglutinative languages. These improvements are even better than using pre-trained word embeddings from extra data. On the SPMRL data sets, our system outperforms the previous best greedy parser (Ballesteros et. al, 2015) by a margin of 3% on average.

pdf bib
Distinguishing Antonyms and Synonyms in a Pattern-based Neural Network
Kim Anh Nguyen | Sabine Schulte im Walde | Ngoc Thang Vu
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Distinguishing between antonyms and synonyms is a key task to achieve high performance in NLP systems. While they are notoriously difficult to distinguish by distributional co-occurrence models, pattern-based methods have proven effective to differentiate between the relations. In this paper, we present a novel neural network model AntSynNET that exploits lexico-syntactic patterns from syntactic parse trees. In addition to the lexical and syntactic information, we successfully integrate the distance between the related words along the syntactic path as a new pattern feature. The results from classification experiments show that AntSynNET improves the performance over prior pattern-based methods.

pdf bib
A General-Purpose Tagger with Convolutional Neural Networks
Xiang Yu | Agnieszka Falenska | Ngoc Thang Vu
Proceedings of the First Workshop on Subword and Character Level Models in NLP

We present a general-purpose tagger based on convolutional neural networks (CNN), used for both composing word vectors and encoding context information. The CNN tagger is robust across different tagging tasks: without task-specific tuning of hyper-parameters, it achieves state-of-the-art results in part-of-speech tagging, morphological tagging and supertagging. The CNN tagger is also robust against the out-of-vocabulary problem; it performs well on artificially unnormalized texts.

pdf bib
Encoding Word Confusion Networks with Recurrent Neural Networks for Dialog State Tracking
Glorianna Jagfeld | Ngoc Thang Vu
Proceedings of the Workshop on Speech-Centric Natural Language Processing

This paper presents our novel method to encode word confusion networks, which can represent a rich hypothesis space of automatic speech recognition systems, via recurrent neural networks. We demonstrate the utility of our approach for the task of dialog state tracking in spoken dialog systems that relies on automatic speech recognition output. Encoding confusion networks outperforms encoding the best hypothesis of the automatic speech recognition in a neural system for dialog state tracking on the well-known second Dialog State Tracking Challenge dataset.

pdf bib
Enriching ASR Lattices with POS Tags for Dependency Parsing
Moritz Stiefel | Ngoc Thang Vu
Proceedings of the Workshop on Speech-Centric Natural Language Processing

Parsing speech requires a richer representation than 1-best or n-best hypotheses, e.g. lattices. Moreover, previous work shows that part-of-speech (POS) tags are a valuable resource for parsing. In this paper, we therefore explore a joint modeling approach of automatic speech recognition (ASR) and POS tagging to enrich ASR word lattices. To that end, we manipulate the ASR process from the pronouncing dictionary onward to use word-POS pairs instead of words. We evaluate ASR, POS tagging and dependency parsing (DP) performance demonstrating a successful lattice-based integration of ASR and POS tagging.

pdf bib
Improving coreference resolution with automatically predicted prosodic information
Ina Roesiger | Sabrina Stehwien | Arndt Riester | Ngoc Thang Vu
Proceedings of the Workshop on Speech-Centric Natural Language Processing

Adding manually annotated prosodic information, specifically pitch accents and phrasing, to the typical text-based feature set for coreference resolution has previously been shown to have a positive effect on German data. Practical applications on spoken language, however, would rely on automatically predicted prosodic information. In this paper we predict pitch accents (and phrase boundaries) using a convolutional neural network (CNN) model from acoustic features extracted from the speech signal. After an assessment of the quality of these automatic prosodic annotations, we show that they also significantly improve coreference resolution.

pdf bib
Neural-based Context Representation Learning for Dialog Act Classification
Daniel Ortega | Ngoc Thang Vu
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

We explore context representation learning methods in neural-based models for dialog act classification. We propose and compare extensively different methods which combine recurrent neural network architectures and attention mechanisms (AMs) at different context levels. Our experimental results on two benchmark datasets show consistent improvements compared to the models without contextual information and reveal that the most suitable AM in the architecture depends on the nature of the dataset.

pdf bib
Hierarchical Embeddings for Hypernymy Detection and Directionality
Kim Anh Nguyen | Maximilian Köper | Sabine Schulte im Walde | Ngoc Thang Vu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present a novel neural model HyperVec to learn hierarchical embeddings for hypernymy detection and directionality. While previous embeddings have shown limitations on prototypical hypernyms, HyperVec represents an unsupervised measure where embeddings are learned in a specific order and capture the hypernym–hyponym distributional hierarchy. Moreover, our model is able to generalize over unseen hypernymy pairs, when using only small sets of training data, and by mapping to other languages. Results on benchmark datasets show that HyperVec outperforms both state-of-the-art unsupervised measures and embedding models on hypernymy detection and directionality, and on predicting graded lexical entailment.


pdf bib
Combining Recurrent and Convolutional Neural Networks for Relation Classification
Ngoc Thang Vu | Heike Adel | Pankaj Gupta | Hinrich Schütze
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Neural-based Noise Filtering from Word Embeddings
Kim Anh Nguyen | Sabine Schulte im Walde | Ngoc Thang Vu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Word embeddings have been demonstrated to benefit NLP tasks impressively. Yet, there is room for improvements in the vector representations, because current word embeddings typically contain unnecessary information, i.e., noise. We propose two novel models to improve word embeddings by unsupervised learning, in order to yield word denoising embeddings. The word denoising embeddings are obtained by strengthening salient information and weakening noise in the original word embeddings, based on a deep feed-forward neural network filter. Results from benchmark tasks show that the filtered word denoising embeddings outperform the original word embeddings.

pdf bib
Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction
Kim Anh Nguyen | Sabine Schulte im Walde | Ngoc Thang Vu
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Towards a text analysis system for political debates
Dieu-Thu Le | Ngoc Thang Vu | Andre Blessing
Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib
Challenges of Computational Processing of Code-Switching
Özlem Çetinoğlu | Sarah Schulz | Ngoc Thang Vu
Proceedings of the Second Workshop on Computational Approaches to Code Switching


pdf bib
A Linguistically Informed Convolutional Neural Network
Sebastian Ebert | Ngoc Thang Vu | Hinrich Schütze
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
CIS-positive: A Combination of Convolutional Neural Networks and Support Vector Machines for Sentiment Analysis in Twitter
Sebastian Ebert | Ngoc Thang Vu | Hinrich Schütze
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)


pdf bib
Exploration of the Impact of Maximum Entropy in Recurrent Neural Network Language Models for Code-Switching Speech
Ngoc Thang Vu | Tanja Schultz
Proceedings of the First Workshop on Computational Approaches to Code Switching


pdf bib
Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling
Heike Adel | Ngoc Thang Vu | Tanja Schultz
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)