Aaron Courville


2020

pdf bib
Recursive Top-Down Production for Sentence Generation with Latent Trees
Shawn Tan | Yikang Shen | Alessandro Sordoni | Aaron Courville | Timothy J. O’Donnell
Findings of the Association for Computational Linguistics: EMNLP 2020

We model the recursive production property of context-free grammars for natural and synthetic languages. To this end, we present a dynamic programming algorithm that marginalises over latent binary tree structures with N leaves, allowing us to compute the likelihood of a sequence of N tokens under a latent tree model, which we maximise to train a recursive neural function. We demonstrate performance on two synthetic tasks: SCAN, where it outperforms previous models on the LENGTH split, and English question formation, where it performs comparably to decoders with the ground-truth tree structure. We also present experimental results on German-English translation on the Multi30k dataset, and qualitatively analyse the induced tree structures our model learns for the SCAN tasks and the German-English translation task.

pdf bib
Supervised Seeded Iterated Learning for Interactive Language Learning
Yuchen Lu | Soumye Singhal | Florian Strub | Olivier Pietquin | Aaron Courville
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Language drift has been one of the major obstacles to train language models through interaction. When word-based conversational agents are trained towards completing a task, they tend to invent their language rather than leveraging natural language. In recent literature, two general methods partially counter this phenomenon: Supervised Selfplay (S2P) and Seeded Iterated Learning (SIL). While S2P jointly trains interactive and supervised losses to counter the drift, SIL changes the training dynamics to prevent language drift from occurring. In this paper, we first highlight their respective weaknesses, i.e., late-stage training collapses and higher negative likelihood when evaluated on human corpus. Given these observations, we introduce Supervised Seeded Iterated Learning (SSIL) to combine both methods to minimize their respective weaknesses. We then show the effectiveness of in the language-drift translation game.

2018

pdf bib
Straight to the Tree: Constituency Parsing with Neural Syntactic Distance
Yikang Shen | Zhouhan Lin | Athul Paul Jacob | Alessandro Sordoni | Aaron Courville | Yoshua Bengio
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this work, we propose a novel constituency parsing scheme. The model first predicts a real-valued scalar, named syntactic distance, for each split position in the sentence. The topology of grammar tree is then determined by the values of syntactic distances. Compared to traditional shift-reduce parsing schemes, our approach is free from the potentially disastrous compounding error. It is also easier to parallelize and much faster. Our model achieves the state-of-the-art single model F1 score of 92.1 on PTB and 86.4 on CTB dataset, which surpasses the previous single model results by a large margin.

2017

pdf bib
Adversarial Generation of Natural Language
Sandeep Subramanian | Sai Rajeswar | Francis Dutil | Chris Pal | Aaron Courville
Proceedings of the 2nd Workshop on Representation Learning for NLP

Generative Adversarial Networks (GANs) have gathered a lot of attention from the computer vision community, yielding impressive results for image generation. Advances in the adversarial generation of natural language from noise however are not commensurate with the progress made in generating images, and still lag far behind likelihood based methods. In this paper, we take a step towards generating natural language with a GAN objective alone. We introduce a simple baseline that addresses the discrete output space problem without relying on gradient estimators and show that it is able to achieve state-of-the-art results on a Chinese poem generation dataset. We present quantitative results on generating sentences from context-free and probabilistic context-free grammars, and qualitative language modeling results. A conditional version is also described that can generate sequences conditioned on sentence characteristics.

pdf bib
Piecewise Latent Variables for Neural Variational Text Processing
Iulian Vlad Serban | Alexander Ororbia II | Joelle Pineau | Aaron Courville
Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing

Advances in neural variational inference have facilitated the learning of powerful directed graphical models with continuous latent variables, such as variational autoencoders. The hope is that such models will learn to represent rich, multi-modal latent factors in real-world data, such as natural language text. However, current models often assume simplistic priors on the latent variables - such as the uni-modal Gaussian distribution - which are incapable of representing complex latent factors efficiently. To overcome this restriction, we propose the simple, but highly flexible, piecewise constant distribution. This distribution has the capacity to represent an exponential number of modes of a latent target distribution, while remaining mathematically tractable. Our results demonstrate that incorporating this new latent distribution into different models yields substantial improvements in natural language processing tasks such as document modeling and natural language generation for dialogue.

pdf bib
Piecewise Latent Variables for Neural Variational Text Processing
Iulian Vlad Serban | Alexander G. Ororbia | Joelle Pineau | Aaron Courville
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Advances in neural variational inference have facilitated the learning of powerful directed graphical models with continuous latent variables, such as variational autoencoders. The hope is that such models will learn to represent rich, multi-modal latent factors in real-world data, such as natural language text. However, current models often assume simplistic priors on the latent variables - such as the uni-modal Gaussian distribution - which are incapable of representing complex latent factors efficiently. To overcome this restriction, we propose the simple, but highly flexible, piecewise constant distribution. This distribution has the capacity to represent an exponential number of modes of a latent target distribution, while remaining mathematically tractable. Our results demonstrate that incorporating this new latent distribution into different models yields substantial improvements in natural language processing tasks such as document modeling and natural language generation for dialogue.

2016

pdf bib
Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus
Iulian Vlad Serban | Alberto García-Durán | Caglar Gulcehre | Sungjin Ahn | Sarath Chandar | Aaron Courville | Yoshua Bengio
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)