Lili Mou


2020

pdf bib
Unsupervised Paraphrasing by Simulated Annealing
Xianggen Liu | Lili Mou | Fandong Meng | Hao Zhou | Jie Zhou | Sen Song
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose UPSA, a novel approach that accomplishes Unsupervised Paraphrasing by Simulated Annealing. We model paraphrase generation as an optimization problem and propose a sophisticated objective function, involving semantic similarity, expression diversity, and language fluency of paraphrases. UPSA searches the sentence space towards this objective by performing a sequence of local editing. We evaluate our approach on various datasets, namely, Quora, Wikianswers, MSCOCO, and Twitter. Extensive results show that UPSA achieves the state-of-the-art performance compared with previous unsupervised methods in terms of both automatic and human evaluations. Further, our approach outperforms most existing domain-adapted supervised models, showing the generalizability of UPSA.

pdf bib
Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction
Raphael Schumann | Lili Mou | Yao Lu | Olga Vechtomova | Katja Markert
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Automatic sentence summarization produces a shorter version of a sentence, while preserving its most important information. A good summary is characterized by language fluency and high information overlap with the source sentence. We model these two aspects in an unsupervised objective function, consisting of language modeling and semantic similarity metrics. We search for a high-scoring summary by discrete optimization. Our proposed method achieves a new state-of-the art for unsupervised sentence summarization according to ROUGE scores. Additionally, we demonstrate that the commonly reported ROUGE F1 metric is sensitive to summary length. Since this is unwillingly exploited in recent work, we emphasize that future evaluation should explicitly group summarization systems by output length brackets.

pdf bib
Iterative Edit-Based Unsupervised Sentence Simplification
Dhruv Kumar | Lili Mou | Lukasz Golab | Olga Vechtomova
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present a novel iterative, edit-based approach to unsupervised sentence simplification. Our model is guided by a scoring function involving fluency, simplicity, and meaning preservation. Then, we iteratively perform word and phrase-level edits on the complex sentence. Compared with previous approaches, our model does not require a parallel training set, but is more controllable and interpretable. Experiments on Newsela and WikiLarge datasets show that our approach is nearly as effective as state-of-the-art supervised approaches.

pdf bib
Stylized Text Generation: Approaches and Applications
Lili Mou | Olga Vechtomova
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

Text generation has played an important role in various applications of natural language processing (NLP), and kn recent studies, researchers are paying increasing attention to modeling and manipulating the style of the generation text, which we call stylized text generation. In this tutorial, we will provide a comprehensive literature review in this direction. We start from the definition of style and different settings of stylized text generation, illustrated with various applications. Then, we present different settings of stylized generation, such as style-conditioned generation, style-transfer generation, and style-adversarial generation. In each setting, we delve deep into machine learning methods, including embedding learning techniques to represent style, adversarial learning, and reinforcement learning with cycle consistency to match content but to distinguish different styles. We also introduce current approaches to evaluating stylized text generation systems. We conclude our tutorial by presenting the challenges of stylized text generation and discussing future directions, such as small-data training, non-categorical style modeling, and a generalized scope of style transfer (e.g., controlling the syntax as a style).

pdf bib
Formality Style Transfer with Shared Latent Space
Yunli Wang | Yu Wu | Lili Mou | Zhoujun Li | WenHan Chao
Proceedings of the 28th International Conference on Computational Linguistics

Conventional approaches for formality style transfer borrow models from neural machine translation, which typically requires massive parallel data for training. However, the dataset for formality style transfer is considerably smaller than translation corpora. Moreover, we observe that informal and formal sentences closely resemble each other, which is different from the translation task where two languages have different vocabularies and grammars. In this paper, we present a new approach, Sequence-to-Sequence with Shared Latent Space (S2S-SLS), for formality style transfer, where we propose two auxiliary losses and adopt joint training of bi-directional transfer and auto-encoding. Experimental results show that S2S-SLS (with either RNN or Transformer architectures) consistently outperforms baselines in various settings, especially when we have limited data.

pdf bib
Adversarial Learning on the Latent Space for Diverse Dialog Generation
Kashif Khan | Gaurav Sahu | Vikash Balasubramanian | Lili Mou | Olga Vechtomova
Proceedings of the 28th International Conference on Computational Linguistics

Generating relevant responses in a dialog is challenging, and requires not only proper modeling of context in the conversation, but also being able to generate fluent sentences during inference. In this paper, we propose a two-step framework based on generative adversarial nets for generating conditioned responses. Our model first learns a meaningful representation of sentences by autoencoding, and then learns to map an input query to the response representation, which is in turn decoded as a response sentence. Both quantitative and qualitative evaluations show that our model generates more fluent, relevant, and diverse responses than existing state-of-the-art methods.

pdf bib
Improving Word Sense Disambiguation with Translations
Yixing Luan | Bradley Hauer | Lili Mou | Grzegorz Kondrak
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

It has been conjectured that multilingual information can help monolingual word sense disambiguation (WSD). However, existing WSD systems rarely consider multilingual information, and no effective method has been proposed for improving WSD by generating translations. In this paper, we present a novel approach that improves the performance of a base WSD system using machine translation. Since our approach is language independent, we perform WSD experiments on several languages. The results demonstrate that our methods can consistently improve the performance of WSD systems, and obtain state-ofthe-art results in both English and multilingual WSD. To facilitate the use of lexical translation information, we also propose BABALIGN, an precise bitext alignment algorithm which is guided by multilingual lexical correspondences from BabelNet.

2019

pdf bib
Harnessing Pre-Trained Neural Networks with Rules for Formality Style Transfer
Yunli Wang | Yu Wu | Lili Mou | Zhoujun Li | Wenhan Chao
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Formality text style transfer plays an important role in various NLP applications, such as non-native speaker assistants and child education. Early studies normalize informal sentences with rules, before statistical and neural models become a prevailing method in the field. While a rule-based system is still a common preprocessing step for formality style transfer in the neural era, it could introduce noise if we use the rules in a naive way such as data preprocessing. To mitigate this problem, we study how to harness rules into a state-of-the-art neural network that is typically pretrained on massive corpora. We propose three fine-tuning methods in this paper and achieve a new state-of-the-art on benchmark datasets

bib
Discreteness in Neural Natural Language Processing
Lili Mou | Hao Zhou | Lei Li
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): Tutorial Abstracts

This tutorial provides a comprehensive guide to the process of discreteness in neural NLP.As a gentle start, we will briefly introduce the background of deep learning based NLP, where we point out the ubiquitous discreteness of natural language and its challenges in neural information processing. Particularly, we will focus on how such discreteness plays a role in the input space, the latent space, and the output space of a neural network. In each part, we will provide examples, discuss machine learning techniques, as well as demonstrate NLP applications.

pdf bib
Disentangled Representation Learning for Non-Parallel Text Style Transfer
Vineet John | Lili Mou | Hareesh Bahuleyan | Olga Vechtomova
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

This paper tackles the problem of disentangling the latent representations of style and content in language models. We propose a simple yet effective approach, which incorporates auxiliary multi-task and adversarial objectives, for style prediction and bag-of-words prediction, respectively. We show, both qualitatively and quantitatively, that the style and content are indeed disentangled in the latent space. This disentangled latent representation learning can be applied to style transfer on non-parallel corpora. We achieve high performance in terms of transfer accuracy, content preservation, and language fluency, in comparison to various previous approaches.

pdf bib
An Imitation Learning Approach to Unsupervised Parsing
Bowen Li | Lili Mou | Frank Keller
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Recently, there has been an increasing interest in unsupervised parsers that optimize semantically oriented objectives, typically using reinforcement learning. Unfortunately, the learned trees often do not match actual syntax trees well. Shen et al. (2018) propose a structured attention mechanism for language modeling (PRPN), which induces better syntactic structures but relies on ad hoc heuristics. Also, their model lacks interpretability as it is not grounded in parsing actions. In our work, we propose an imitation learning approach to unsupervised parsing, where we transfer the syntactic knowledge induced by PRPN to a Tree-LSTM model with discrete parsing actions. Its policy is then refined by Gumbel-Softmax training towards a semantically oriented objective. We evaluate our approach on the All Natural Language Inference dataset and show that it achieves a new state of the art in terms of parsing F-score, outperforming our base models, including PRPN.

pdf bib
Generating Sentences from Disentangled Syntactic and Semantic Spaces
Yu Bao | Hao Zhou | Shujian Huang | Lei Li | Lili Mou | Olga Vechtomova | Xin-yu Dai | Jiajun Chen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Variational auto-encoders (VAEs) are widely used in natural language generation due to the regularization of the latent space. However, generating sentences from the continuous latent space does not explicitly model the syntactic information. In this paper, we propose to generate sentences from disentangled syntactic and semantic spaces. Our proposed method explicitly models syntactic information in the VAE’s latent space by using the linearized tree sequence, leading to better performance of language generation. Additionally, the advantage of sampling in the disentangled syntactic and semantic latent spaces enables us to perform novel applications, such as the unsupervised paraphrase generation and syntax transfer generation. Experimental results show that our proposed model achieves similar or better performance in various tasks, compared with state-of-the-art related work.

pdf bib
Stochastic Wasserstein Autoencoder for Probabilistic Sentence Generation
Hareesh Bahuleyan | Lili Mou | Hao Zhou | Olga Vechtomova
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The variational autoencoder (VAE) imposes a probabilistic distribution (typically Gaussian) on the latent space and penalizes the Kullback-Leibler (KL) divergence between the posterior and prior. In NLP, VAEs are extremely difficult to train due to the problem of KL collapsing to zero. One has to implement various heuristics such as KL weight annealing and word dropout in a carefully engineered manner to successfully train a VAE for text. In this paper, we propose to use the Wasserstein autoencoder (WAE) for probabilistic sentence generation, where the encoder could be either stochastic or deterministic. We show theoretically and empirically that, in the original WAE, the stochastically encoded Gaussian distribution tends to become a Dirac-delta function, and we propose a variant of WAE that encourages the stochasticity of the encoder. Experimental results show that the latent space learned by WAE exhibits properties of continuity and smoothness as in VAEs, while simultaneously achieving much higher BLEU scores for sentence reconstruction.

2018

pdf bib
Towards Neural Speaker Modeling in Multi-Party Conversation: The Task, Dataset, and Models
Zhao Meng | Lili Mou | Zhi Jin
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Variational Attention for Sequence-to-Sequence Models
Hareesh Bahuleyan | Lili Mou | Olga Vechtomova | Pascal Poupart
Proceedings of the 27th International Conference on Computational Linguistics

The variational encoder-decoder (VED) encodes source information as a set of random variables using a neural network, which in turn is decoded into target data using another neural network. In natural language processing, sequence-to-sequence (Seq2Seq) models typically serve as encoder-decoder networks. When combined with a traditional (deterministic) attention mechanism, the variational latent space may be bypassed by the attention model, and thus becomes ineffective. In this paper, we propose a variational attention mechanism for VED, where the attention vector is also modeled as Gaussian distributed random variables. Results on two experiments show that, without loss of quality, our proposed method alleviates the bypassing phenomenon as it increases the diversity of generated sentences.

pdf bib
Modeling Past and Future for Neural Machine Translation
Zaixiang Zheng | Hao Zhou | Shujian Huang | Lili Mou | Xinyu Dai | Jiajun Chen | Zhaopeng Tu
Transactions of the Association for Computational Linguistics, Volume 6

Existing neural machine translation systems do not explicitly model what has been translated and what has not during the decoding phase. To address this problem, we propose a novel mechanism that separates the source information into two parts: translated Past contents and untranslated Future contents, which are modeled by two additional recurrent layers. The Past and Future contents are fed to both the attention model and the decoder states, which provides Neural Machine Translation (NMT) systems with the knowledge of translated and untranslated contents. Experimental results show that the proposed approach significantly improves the performance in Chinese-English, German-English, and English-German translation tasks. Specifically, the proposed model outperforms the conventional coverage model in terms of both the translation quality and the alignment error rate.

2017

pdf bib
How to Make Context More Useful? An Empirical Study on Context-Aware Neural Conversational Models
Zhiliang Tian | Rui Yan | Lili Mou | Yiping Song | Yansong Feng | Dongyan Zhao
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Generative conversational systems are attracting increasing attention in natural language processing (NLP). Recently, researchers have noticed the importance of context information in dialog processing, and built various models to utilize context. However, there is no systematic comparison to analyze how to use context effectively. In this paper, we conduct an empirical study to compare various models and investigate the effect of context information in dialog systems. We also propose a variant that explicitly weights context vectors by context-query relevance, outperforming the other baselines.

2016

pdf bib
How Transferable are Neural Networks in NLP Applications?
Lili Mou | Zhao Meng | Rui Yan | Ge Li | Yan Xu | Lu Zhang | Zhi Jin
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Improved relation classification by deep recurrent neural networks with data augmentation
Yan Xu | Ran Jia | Lili Mou | Ge Li | Yunchuan Chen | Yangyang Lu | Zhi Jin
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Nowadays, neural networks play an important role in the task of relation classification. By designing different neural architectures, researchers have improved the performance to a large extent in comparison with traditional methods. However, existing neural networks for relation classification are usually of shallow architectures (e.g., one-layer convolutional neural networks or recurrent networks). They may fail to explore the potential representation space in different abstraction levels. In this paper, we propose deep recurrent neural networks (DRNNs) for relation classification to tackle this challenge. Further, we propose a data augmentation method by leveraging the directionality of relations. We evaluated our DRNNs on the SemEval-2010 Task 8, and achieve an F1-score of 86.1%, outperforming previous state-of-the-art recorded results.

pdf bib
Sequence to Backward and Forward Sequences: A Content-Introducing Approach to Generative Short-Text Conversation
Lili Mou | Yiping Song | Rui Yan | Ge Li | Lu Zhang | Zhi Jin
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Using neural networks to generate replies in human-computer dialogue systems is attracting increasing attention over the past few years. However, the performance is not satisfactory: the neural network tends to generate safe, universally relevant replies which carry little meaning. In this paper, we propose a content-introducing approach to neural network-based generative dialogue systems. We first use pointwise mutual information (PMI) to predict a noun as a keyword, reflecting the main gist of the reply. We then propose seq2BF, a “sequence to backward and forward sequences” model, which generates a reply containing the given keyword. Experimental results show that our approach significantly outperforms traditional sequence-to-sequence models in terms of human evaluation and the entropy measure, and that the predicted keyword can appear at an appropriate position in the reply.

pdf bib
Compressing Neural Language Models by Sparse Word Representations
Yunchuan Chen | Lili Mou | Yan Xu | Ge Li | Zhi Jin
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Natural Language Inference by Tree-Based Convolution and Heuristic Matching
Lili Mou | Rui Men | Ge Li | Yan Xu | Lu Zhang | Rui Yan | Zhi Jin
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf bib
Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths
Yan Xu | Lili Mou | Ge Li | Yunchuan Chen | Hao Peng | Zhi Jin
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Comparative Study on Regularization Strategies for Embedding-based Neural Networks
Hao Peng | Lili Mou | Ge Li | Yunchuan Chen | Yangyang Lu | Zhi Jin
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Discriminative Neural Sentence Modeling by Tree-Based Convolution
Lili Mou | Hao Peng | Ge Li | Yan Xu | Lu Zhang | Zhi Jin
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing