Joelle Pineau


2020

pdf bib
Learning an Unreferenced Metric for Online Dialogue Evaluation
Koustuv Sinha | Prasanna Parthasarathi | Jasmine Wang | Ryan Lowe | William L. Hamilton | Joelle Pineau
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Evaluating the quality of a dialogue interaction between two agents is a difficult task, especially in open-domain chit-chat style dialogue. There have been recent efforts to develop automatic dialogue evaluation metrics, but most of them do not generalize to unseen datasets and/or need a human-generated reference response during inference, making it infeasible for online evaluation. Here, we propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances, and leverages the temporal transitions that exist between them. We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.

2019

pdf bib
CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text
Koustuv Sinha | Shagun Sodhani | Jin Dong | Joelle Pineau | William L. Hamilton
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The recent success of natural language understanding (NLU) systems has been troubled by results highlighting the failure of these models to generalize in a systematic and robust way. In this work, we introduce a diagnostic benchmark suite, named CLUTRR, to clarify some key issues related to the robustness and systematicity of NLU systems. Motivated by the classic work on inductive logic programming, CLUTRR requires that an NLU system infer kinship relations between characters in short stories. Successful performance on this task requires both extracting relationships between entities, as well as inferring the logical rules governing these relationships. CLUTRR allows us to precisely measure a model’s ability for systematic generalization by evaluating on held-out combinations of logical rules, and allows us to evaluate a model’s robustness by adding curated noise facts. Our empirical results highlight a substantial performance gap between state-of-the-art NLU models (e.g., BERT and MAC) and a graph neural network model that works directly with symbolic inputs—with the graph-based model exhibiting both stronger generalization and greater robustness.

pdf bib
Seeded self-play for language learning
Abhinav Gupta | Ryan Lowe | Jakob Foerster | Douwe Kiela | Joelle Pineau
Proceedings of the Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)

How can we teach artificial agents to use human language flexibly to solve problems in real-world environments? We have an example of this in nature: human babies eventually learn to use human language to solve problems, and they are taught with an adult human-in-the-loop. Unfortunately, current machine learning methods (e.g. from deep reinforcement learning) are too data inefficient to learn language in this way. An outstanding goal is finding an algorithm with a suitable ‘language learning prior’ that allows it to learn human language, while minimizing the number of on-policy human interactions. In this paper, we propose to learn such a prior in simulation using an approach we call, Learning to Learn to Communicate (L2C). Specifically, in L2C we train a meta-learning agent in simulation to interact with populations of pre-trained agents, each with their own distinct communication protocol. Once the meta-learning agent is able to quickly adapt to each population of agents, it can be deployed in new populations, including populations speaking human language. Our key insight is that such populations can be obtained via self-play, after pre-training agents with imitation learning on a small amount of off-policy human language data. We call this latter technique Seeded Self-Play (S2P). Our preliminary experiments show that agents trained with L2C and S2P need fewer on-policy samples to learn a compositional language in a Lewis signaling game.

2018

pdf bib
Extending Neural Generative Conversational Model using External Knowledge Sources
Prasanna Parthasarathi | Joelle Pineau
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

The use of connectionist approaches in conversational agents has been progressing rapidly due to the availability of large corpora. However current generative dialogue models often lack coherence and are content poor. This work proposes an architecture to incorporate unstructured knowledge sources to enhance the next utterance prediction in chit-chat type of generative dialogue models. We focus on Sequence-to-Sequence (Seq2Seq) conversational agents trained with the Reddit News dataset, and consider incorporating external knowledge from Wikipedia summaries as well as from the NELL knowledge base. Our experiments show faster training time and improved perplexity when leveraging external knowledge.

2017

pdf bib
Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses
Ryan Lowe | Michael Noseworthy | Iulian Vlad Serban | Nicolas Angelard-Gontier | Yoshua Bengio | Joelle Pineau
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem. Unfortunately, existing automatic evaluation metrics are biased and correlate very poorly with human judgements of response quality (Liu et al., 2016). Yet having an accurate automatic evaluation procedure is crucial for dialogue research, as it allows rapid prototyping and testing of new models with fewer expensive human evaluations. In response to this challenge, we formulate automatic dialogue evaluation as a learning problem.We present an evaluation model (ADEM)that learns to predict human-like scores to input responses, using a new dataset of human response scores. We show that the ADEM model’s predictions correlate significantly, and at a level much higher than word-overlap metrics such as BLEU, with human judgements at both the utterance and system-level. We also show that ADEM can generalize to evaluating dialogue mod-els unseen during training, an important step for automatic dialogue evaluation.

pdf bib
Piecewise Latent Variables for Neural Variational Text Processing
Iulian Vlad Serban | Alexander Ororbia II | Joelle Pineau | Aaron Courville
Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing

Advances in neural variational inference have facilitated the learning of powerful directed graphical models with continuous latent variables, such as variational autoencoders. The hope is that such models will learn to represent rich, multi-modal latent factors in real-world data, such as natural language text. However, current models often assume simplistic priors on the latent variables - such as the uni-modal Gaussian distribution - which are incapable of representing complex latent factors efficiently. To overcome this restriction, we propose the simple, but highly flexible, piecewise constant distribution. This distribution has the capacity to represent an exponential number of modes of a latent target distribution, while remaining mathematically tractable. Our results demonstrate that incorporating this new latent distribution into different models yields substantial improvements in natural language processing tasks such as document modeling and natural language generation for dialogue.

pdf bib
MACA: A Modular Architecture for Conversational Agents
Hoai Phuoc Truong | Prasanna Parthasarathi | Joelle Pineau
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

We propose a software architecture designed to ease the implementation of dialogue systems. The Modular Architecture for Conversational Agents (MACA) uses a plug-n-play style that allows quick prototyping, thereby facilitating the development of new techniques and the reproduction of previous work. The architecture separates the domain of the conversation from the agent’s dialogue strategy, and as such can be easily extended to multiple domains. MACA provides tools to host dialogue agents on Amazon Mechanical Turk (mTurk) for data collection and allows processing of other sources of training data. The current version of the framework already incorporates several domains and existing dialogue strategies from the recent literature.

pdf bib
Predicting Success in Goal-Driven Human-Human Dialogues
Michael Noseworthy | Jackie Chi Kit Cheung | Joelle Pineau
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

In goal-driven dialogue systems, success is often defined based on a structured definition of the goal. This requires that the dialogue system be constrained to handle a specific class of goals and that there be a mechanism to measure success with respect to that goal. However, in many human-human dialogues the diversity of goals makes it infeasible to define success in such a way. To address this scenario, we consider the task of automatically predicting success in goal-driven human-human dialogues using only the information communicated between participants in the form of text. We build a dataset from stackoverflow.com which consists of exchanges between two users in the technical domain where ground-truth success labels are available. We then propose a turn-based hierarchical neural network model that can be used to predict success without requiring a structured goal definition. We show this model outperforms rule-based heuristics and other baselines as it is able to detect patterns over the course of a dialogue and capture notions such as gratitude.

pdf bib
Piecewise Latent Variables for Neural Variational Text Processing
Iulian Vlad Serban | Alexander G. Ororbia | Joelle Pineau | Aaron Courville
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Advances in neural variational inference have facilitated the learning of powerful directed graphical models with continuous latent variables, such as variational autoencoders. The hope is that such models will learn to represent rich, multi-modal latent factors in real-world data, such as natural language text. However, current models often assume simplistic priors on the latent variables - such as the uni-modal Gaussian distribution - which are incapable of representing complex latent factors efficiently. To overcome this restriction, we propose the simple, but highly flexible, piecewise constant distribution. This distribution has the capacity to represent an exponential number of modes of a latent target distribution, while remaining mathematically tractable. Our results demonstrate that incorporating this new latent distribution into different models yields substantial improvements in natural language processing tasks such as document modeling and natural language generation for dialogue.

2016

pdf bib
How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation
Chia-Wei Liu | Ryan Lowe | Iulian Serban | Mike Noseworthy | Laurent Charlin | Joelle Pineau
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
On the Evaluation of Dialogue Systems with Next Utterance Classification
Ryan Lowe | Iulian Vlad Serban | Michael Noseworthy | Laurent Charlin | Joelle Pineau
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2015

pdf bib
The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems
Ryan Lowe | Nissan Pow | Iulian Serban | Joelle Pineau
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2000

pdf bib
Spoken Dialogue Management Using Probabilistic Reasoning
Nicholas Roy | Joelle Pineau | Sebastian Thrun
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics