Hareesh Bahuleyan


pdf bib
Diverse Keyphrase Generation with Neural Unlikelihood Training
Hareesh Bahuleyan | Layla El Asri
Proceedings of the 28th International Conference on Computational Linguistics

In this paper, we study sequence-to-sequence (S2S) keyphrase generation models from the perspective of diversity. Recent advances in neural natural language generation have made possible remarkable progress on the task of keyphrase generation, demonstrated through improvements on quality metrics such as F1-score. However, the importance of diversity in keyphrase generation has been largely ignored. We first analyze the extent of information redundancy present in the outputs generated by a baseline model trained using maximum likelihood estimation (MLE). Our findings show that repetition of keyphrases is a major issue with MLE training. To alleviate this issue, we adopt neural unlikelihood (UL) objective for training the S2S model. Our version of UL training operates at (1) the target token level to discourage the generation of repeating tokens; (2) the copy token level to avoid copying repetitive tokens from the source text. Further, to encourage better model planning during the decoding process, we incorporate K-step ahead token prediction objective that computes both MLE and UL losses on future tokens as well. Through extensive experiments on datasets from three different domains we demonstrate that the proposed approach attains considerably large diversity gains, while maintaining competitive output quality.


pdf bib
Disentangled Representation Learning for Non-Parallel Text Style Transfer
Vineet John | Lili Mou | Hareesh Bahuleyan | Olga Vechtomova
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

This paper tackles the problem of disentangling the latent representations of style and content in language models. We propose a simple yet effective approach, which incorporates auxiliary multi-task and adversarial objectives, for style prediction and bag-of-words prediction, respectively. We show, both qualitatively and quantitatively, that the style and content are indeed disentangled in the latent space. This disentangled latent representation learning can be applied to style transfer on non-parallel corpora. We achieve high performance in terms of transfer accuracy, content preservation, and language fluency, in comparison to various previous approaches.

pdf bib
Stochastic Wasserstein Autoencoder for Probabilistic Sentence Generation
Hareesh Bahuleyan | Lili Mou | Hao Zhou | Olga Vechtomova
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The variational autoencoder (VAE) imposes a probabilistic distribution (typically Gaussian) on the latent space and penalizes the Kullback-Leibler (KL) divergence between the posterior and prior. In NLP, VAEs are extremely difficult to train due to the problem of KL collapsing to zero. One has to implement various heuristics such as KL weight annealing and word dropout in a carefully engineered manner to successfully train a VAE for text. In this paper, we propose to use the Wasserstein autoencoder (WAE) for probabilistic sentence generation, where the encoder could be either stochastic or deterministic. We show theoretically and empirically that, in the original WAE, the stochastically encoded Gaussian distribution tends to become a Dirac-delta function, and we propose a variant of WAE that encourages the stochasticity of the encoder. Experimental results show that the latent space learned by WAE exhibits properties of continuity and smoothness as in VAEs, while simultaneously achieving much higher BLEU scores for sentence reconstruction.


pdf bib
Variational Attention for Sequence-to-Sequence Models
Hareesh Bahuleyan | Lili Mou | Olga Vechtomova | Pascal Poupart
Proceedings of the 27th International Conference on Computational Linguistics

The variational encoder-decoder (VED) encodes source information as a set of random variables using a neural network, which in turn is decoded into target data using another neural network. In natural language processing, sequence-to-sequence (Seq2Seq) models typically serve as encoder-decoder networks. When combined with a traditional (deterministic) attention mechanism, the variational latent space may be bypassed by the attention model, and thus becomes ineffective. In this paper, we propose a variational attention mechanism for VED, where the attention vector is also modeled as Gaussian distributed random variables. Results on two experiments show that, without loss of quality, our proposed method alleviates the bypassing phenomenon as it increases the diversity of generated sentences.


pdf bib
UWaterloo at SemEval-2017 Task 8: Detecting Stance towards Rumours with Topic Independent Features
Hareesh Bahuleyan | Olga Vechtomova
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes our system for subtask-A: SDQC for RumourEval, task-8 of SemEval 2017. Identifying rumours, especially for breaking news events as they unfold, is a challenging task due to the absence of sufficient information about the exact rumour stories circulating on social media. Determining the stance of Twitter users towards rumourous messages could provide an indirect way of identifying potential rumours. The proposed approach makes use of topic independent features from two categories, namely cue features and message specific features to fit a gradient boosting classifier. With an accuracy of 0.78, our system achieved the second best performance on subtask-A of RumourEval.