Nurul Lubis


2020

pdf bib
Knowing What You Know: Calibrating Dialogue Belief State Distributions via Ensembles
Carel van Niekerk | Michael Heck | Christian Geishauser | Hsien-chin Lin | Nurul Lubis | Marco Moresi | Milica Gasic
Findings of the Association for Computational Linguistics: EMNLP 2020

The ability to accurately track what happens during a conversation is essential for the performance of a dialogue system. Current state-of-the-art multi-domain dialogue state trackers achieve just over 55% accuracy on the current go-to benchmark, which means that in almost every second dialogue turn they place full confidence in an incorrect dialogue state. Belief trackers, on the other hand, maintain a distribution over possible dialogue states. However, they lack in performance compared to dialogue state trackers, and do not produce well calibrated distributions. In this work we present state-of-the-art performance in calibration for multi-domain dialogue belief trackers using a calibrated ensemble of models. Our resulting dialogue belief tracker also outperforms previous dialogue belief tracking models in terms of accuracy.

pdf bib
LAVA: Latent Action Spaces via Variational Auto-encoding for Dialogue Policy Optimization
Nurul Lubis | Christian Geishauser | Michael Heck | Hsien-chin Lin | Marco Moresi | Carel van Niekerk | Milica Gasic
Proceedings of the 28th International Conference on Computational Linguistics

Reinforcement learning (RL) can enable task-oriented dialogue systems to steer the conversation towards successful task completion. In an end-to-end setting, a response can be constructed in a word-level sequential decision making process with the entire system vocabulary as action space. Policies trained in such a fashion do not require expert-defined action spaces, but they have to deal with large action spaces and long trajectories, making RL impractical. Using the latent space of a variational model as action space alleviates this problem. However, current approaches use an uninformed prior for training and optimize the latent distribution solely on the context. It is therefore unclear whether the latent representation truly encodes the characteristics of different actions. In this paper, we explore three ways of leveraging an auxiliary task to shape the latent variable distribution: via pre-training, to obtain an informed prior, and via multitask learning. We choose response auto-encoding as the auxiliary task, as this captures the generative factors of dialogue responses while requiring low computational cost and neither additional data nor labels. Our approach yields a more action-characterized latent representations which support end-to-end dialogue policy optimization and achieves state-of-the-art success rates. These results warrant a more wide-spread use of RL in end-to-end dialogue models.

pdf bib
Out-of-Task Training for Dialog State Tracking Models
Michael Heck | Christian Geishauser | Hsien-chin Lin | Nurul Lubis | Marco Moresi | Carel van Niekerk | Milica Gasic
Proceedings of the 28th International Conference on Computational Linguistics

Dialog state tracking (DST) suffers from severe data sparsity. While many natural language processing (NLP) tasks benefit from transfer learning and multi-task learning, in dialog these methods are limited by the amount of available data and by the specificity of dialog applications. In this work, we successfully utilize non-dialog data from unrelated NLP tasks to train dialog state trackers. This opens the door to the abundance of unrelated NLP corpora to mitigate the data sparsity issue inherent to DST.

pdf bib
TripPy: A Triple Copy Strategy for Value Independent Neural Dialog State Tracking
Michael Heck | Carel van Niekerk | Nurul Lubis | Christian Geishauser | Hsien-Chin Lin | Marco Moresi | Milica Gasic
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Task-oriented dialog systems rely on dialog state tracking (DST) to monitor the user’s goal during the course of an interaction. Multi-domain and open-vocabulary settings complicate the task considerably and demand scalable solutions. In this paper we present a new approach to DST which makes use of various copy mechanisms to fill slots with values. Our model has no need to maintain a list of candidate values. Instead, all values are extracted from the dialog context on-the-fly. A slot is filled by one of three copy mechanisms: (1) Span prediction may extract values directly from the user input; (2) a value may be copied from a system inform memory that keeps track of the system’s inform operations (3) a value may be copied over from a different slot that is already contained in the dialog state to resolve coreferences within and across domains. Our approach combines the advantages of span-based slot filling methods with memory methods to avoid the use of value picklists altogether. We argue that our strategy simplifies the DST task while at the same time achieving state of the art performance on various popular evaluation sets including Multiwoz 2.1, where we achieve a joint goal accuracy beyond 55%.

2018

pdf bib
Unsupervised Counselor Dialogue Clustering for Positive Emotion Elicitation in Neural Dialogue System
Nurul Lubis | Sakriani Sakti | Koichiro Yoshino | Satoshi Nakamura
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

Positive emotion elicitation seeks to improve user’s emotional state through dialogue system interaction, where a chat-based scenario is layered with an implicit goal to address user’s emotional needs. Standard neural dialogue system approaches still fall short in this situation as they tend to generate only short, generic responses. Learning from expert actions is critical, as these potentially differ from standard dialogue acts. In this paper, we propose using a hierarchical neural network for response generation that is conditioned on 1) expert’s action, 2) dialogue context, and 3) user emotion, encoded from user input. We construct a corpus of interactions between a counselor and 30 participants following a negative emotional exposure to learn expert actions and responses in a positive emotion elicitation scenario. Instead of relying on the expensive, labor intensive, and often ambiguous human annotations, we unsupervisedly cluster the expert’s responses and use the resulting labels to train the network. Our experiments and evaluation show that the proposed approach yields lower perplexity and generates a larger variety of responses.

2016

pdf bib
Construction of Japanese Audio-Visual Emotion Database and Its Application in Emotion Recognition
Nurul Lubis | Randy Gomez | Sakriani Sakti | Keisuke Nakamura | Koichiro Yoshino | Satoshi Nakamura | Kazuhiro Nakadai
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Emotional aspects play a vital role in making human communication a rich and dynamic experience. As we introduce more automated system in our daily lives, it becomes increasingly important to incorporate emotion to provide as natural an interaction as possible. To achieve said incorporation, rich sets of labeled emotional data is prerequisite. However, in Japanese, existing emotion database is still limited to unimodal and bimodal corpora. Since emotion is not only expressed through speech, but also visually at the same time, it is essential to include multiple modalities in an observation. In this paper, we present the first audio-visual emotion corpora in Japanese, collected from 14 native speakers. The corpus contains 100 minutes of annotated and transcribed material. We performed preliminary emotion recognition experiments on the corpus and achieved an accuracy of 61.42% for five classes of emotion.