Chenghua Lin


2020

pdf bib
Improving Variational Autoencoder for Text Modelling with Timestep-Wise Regularisation
Ruizhe Li | Xiao Li | Guanyi Chen | Chenghua Lin
Proceedings of the 28th International Conference on Computational Linguistics

The Variational Autoencoder (VAE) is a popular and powerful model applied to text modelling to generate diverse sentences. However, an issue known as posterior collapse (or KL loss vanishing) happens when the VAE is used in text modelling, where the approximate posterior collapses to the prior, and the model will totally ignore the latent variables and be degraded to a plain language model during text generation. Such an issue is particularly prevalent when RNN-based VAE models are employed for text modelling. In this paper, we propose a simple, generic architecture called Timestep-Wise Regularisation VAE (TWR-VAE), which can effectively avoid posterior collapse and can be applied to any RNN-based VAE models. The effectiveness and versatility of our model are demonstrated in different tasks, including language modelling and dialogue response generation.

pdf bib
DGST: a Dual-Generator Network for Text Style Transfer
Xiao Li | Guanyi Chen | Chenghua Lin | Ruizhe Li
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose DGST, a novel and simple Dual-Generator network architecture for text Style Transfer. Our model employs two generators only, and does not rely on any discriminators or parallel corpus for training. Both quantitative and qualitative experiments on the Yelp and IMDb datasets show that our model gives competitive performance compared to several strong baselines with more complicated architecture designs.

2019

pdf bib
A Dual-Attention Hierarchical Recurrent Neural Network for Dialogue Act Classification
Ruizhe Li | Chenghua Lin | Matthew Collinson | Xiao Li | Guanyi Chen
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Recognising dialogue acts (DA) is important for many natural language processing tasks such as dialogue generation and intention recognition. In this paper, we propose a dual-attention hierarchical recurrent neural network for DA classification. Our model is partially inspired by the observation that conversational utterances are normally associated with both a DA and a topic, where the former captures the social act and the latter describes the subject matter. However, such a dependency between DAs and topics has not been utilised by most existing systems for DA classification. With a novel dual task-specific attention mechanism, our model is able, for utterances, to capture information about both DAs and topics, as well as information about the interactions between them. Experimental results show that by modelling topic as an auxiliary task, our model can significantly improve DA classification, yielding better or comparable performance to the state-of-the-art method on three public datasets.

pdf bib
End-to-End Sequential Metaphor Identification Inspired by Linguistic Theories
Rui Mao | Chenghua Lin | Frank Guerin
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

End-to-end training with Deep Neural Networks (DNN) is a currently popular method for metaphor identification. However, standard sequence tagging models do not explicitly take advantage of linguistic theories of metaphor identification. We experiment with two DNN models which are inspired by two human metaphor identification procedures. By testing on three public datasets, we find that our models achieve state-of-the-art performance in end-to-end metaphor identification.

pdf bib
Proceedings of the 12th International Conference on Natural Language Generation
Kees van Deemter | Chenghua Lin | Hiroya Takamura
Proceedings of the 12th International Conference on Natural Language Generation

pdf bib
QTUNA: A Corpus for Understanding How Speakers Use Quantification
Guanyi Chen | Kees van Deemter | Silvia Pagliaro | Louk Smalbil | Chenghua Lin
Proceedings of the 12th International Conference on Natural Language Generation

A prominent strand of work in formal semantics investigates the ways in which human languages quantify over the elements of a set, as when we say “All A are B ”, “All except two A are B ”, “Only a few of the A are B ” and so on. Our aim is to build Natural Language Generation algorithms that mimic humans’ use of quantified expressions. To inform these algorithms, we conducted on a series of elicitation experiments in which human speakers were asked to perform a linguistic task that invites the use of quantified expressions. We discuss how these experiments were conducted and what corpora they gave rise to. We conduct an informal analysis of the corpora, and offer an initial assessment of the challenges that these corpora pose for Natural Language Generation. The dataset is available at: https://github.com/a-quei/qtuna.

pdf bib
Generating Quantified Descriptions of Abstract Visual Scenes
Guanyi Chen | Kees van Deemter | Chenghua Lin
Proceedings of the 12th International Conference on Natural Language Generation

Quantified expressions have always taken up a central position in formal theories of meaning and language use. Yet quantified expressions have so far attracted far less attention from the Natural Language Generation community than, for example, referring expressions. In an attempt to start redressing the balance, we investigate a recently developed corpus in which quantified expressions play a crucial role; the corpus is the result of a carefully controlled elicitation experiment, in which human participants were asked to describe visually presented scenes. Informed by an analysis of this corpus, we propose algorithms that produce computer-generated descriptions of a wider class of visual scenes, and we evaluate the descriptions generated by these algorithms in terms of their correctness, completeness, and human-likeness. We discuss what this exercise can teach us about the nature of quantification and about the challenges posed by the generation of quantified expressions.

pdf bib
A Stable Variational Autoencoder for Text Modelling
Ruizhe Li | Xiao Li | Chenghua Lin | Matthew Collinson | Rui Mao
Proceedings of the 12th International Conference on Natural Language Generation

Variational Autoencoder (VAE) is a powerful method for learning representations of high-dimensional data. However, VAEs can suffer from an issue known as latent variable collapse (or KL term vanishing), where the posterior collapses to the prior and the model will ignore the latent codes in generative tasks. Such an issue is particularly prevalent when employing VAE-RNN architectures for text modelling (Bowman et al., 2016; Yang et al., 2017). In this paper, we present a new architecture called Full-Sampling-VAE-RNN, which can effectively avoid latent variable collapse. Compared to the general VAE-RNN architectures, we show that our model can achieve much more stable training process and can generate text with significantly better quality.

2018

pdf bib
ABDN at SemEval-2018 Task 10: Recognising Discriminative Attributes using Context Embeddings and WordNet
Rui Mao | Guanyi Chen | Ruizhe Li | Chenghua Lin
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes the system that we submitted for SemEval-2018 task 10: capturing discriminative attributes. Our system is built upon a simple idea of measuring the attribute word’s similarity with each of the two semantically similar words, based on an extended word embedding method and WordNet. Instead of computing the similarities between the attribute and semantically similar words by using standard word embeddings, we propose a novel method that combines word and context embeddings which can better measure similarities. Our model is simple and effective, which achieves an average F1 score of 0.62 on the test set.

pdf bib
Word Embedding and WordNet Based Metaphor Identification and Interpretation
Rui Mao | Chenghua Lin | Frank Guerin
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Metaphoric expressions are widespread in natural language, posing a significant challenge for various natural language processing tasks such as Machine Translation. Current word embedding based metaphor identification models cannot identify the exact metaphorical words within a sentence. In this paper, we propose an unsupervised learning method that identifies and interprets metaphors at word-level without any preprocessing, outperforming strong baselines in the metaphor identification task. Our model extends to interpret the identified metaphors, paraphrasing them into their literal counterparts, so that they can be better translated by machines. We evaluated this with two popular translation systems for English to Chinese, showing that our model improved the systems significantly.

pdf bib
SimpleNLG-ZH: a Linguistic Realisation Engine for Mandarin
Guanyi Chen | Kees van Deemter | Chenghua Lin
Proceedings of the 11th International Conference on Natural Language Generation

We introduce SimpleNLG-ZH, a realisation engine for Mandarin that follows the software design paradigm of SimpleNLG (Gatt and Reiter, 2009). We explain the core grammar (morphology and syntax) and the lexicon of SimpleNLG-ZH, which is very different from English and other languages for which SimpleNLG engines have been built. The system was evaluated by regenerating expressions from a body of test sentences and a corpus of human-authored expressions. Human evaluation was conducted to estimate the quality of regenerated sentences.

pdf bib
Modelling Pro-drop with the Rational Speech Acts Model
Guanyi Chen | Kees van Deemter | Chenghua Lin
Proceedings of the 11th International Conference on Natural Language Generation

We extend the classic Referring Expressions Generation task by considering zero pronouns in “pro-drop” languages such as Chinese, modelling their use by means of the Bayesian Rational Speech Acts model (Frank and Goodman, 2012). By assuming that highly salient referents are most likely to be referred to by zero pronouns (i.e., pro-drop is more likely for salient referents than the less salient ones), the model offers an attractive explanation of a phenomenon not previously addressed probabilistically.

pdf bib
Statistical NLG for Generating the Content and Form of Referring Expressions
Xiao Li | Kees van Deemter | Chenghua Lin
Proceedings of the 11th International Conference on Natural Language Generation

This paper argues that a new generic approach to statistical NLG can be made to perform Referring Expression Generation (REG) successfully. The model does not only select attributes and values for referring to a target referent, but also performs Linguistic Realisation, generating an actual Noun Phrase. Our evaluations suggest that the attribute selection aspect of the algorithm exceeds classic REG algorithms, while the Noun Phrases generated are as similar to those in a previously developed corpus as were Noun Phrases produced by a new set of human speakers.

pdf bib
Generating Description for Sequential Images with Local-Object Attention Conditioned on Global Semantic Context
Jing Su | Chenghua Lin | Mian Zhou | Qingyun Dai | Haoyu Lv
Proceedings of the Workshop on Intelligent Interactive Systems and Language Generation (2IS&NLG)

2017

pdf bib
Extracting and Understanding Contrastive Opinion through Topic Relevant Sentences
Ebuka Ibeke | Chenghua Lin | Adam Wyner | Mohamad Hardyman Barawi
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Contrastive opinion mining is essential in identifying, extracting and organising opinions from user generated texts. Most existing studies separate input data into respective collections. In addition, the relationships between the topics extracted and the sentences in the corpus which express the topics are opaque, hindering our understanding of the opinions expressed in the corpus. We propose a novel unified latent variable model (contraLDA) which addresses the above matters. Experimental results show the effectiveness of our model in mining contrasted opinions, outperforming our baselines.

pdf bib
Analysing the Causes of Depressed Mood from Depression Vulnerable Individuals
Noor Fazilla Abd Yusof | Chenghua Lin | Frank Guerin
Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017)

We develop a computational model to discover the potential causes of depression by analysing the topics in a usergenerated text. We show the most prominent causes, and how these causes evolve over time. Also, we highlight the differences in causes between students with low and high neuroticism. Our studies demonstrate that the topics reveal valuable clues about the causes contributing to depressed mood. Identifying causes can have a significant impact on improving the quality of depression care; thereby providing greater insights into a patient’s state for pertinent treatment recommendations. Hence, this study significantly expands the ability to discover the potential factors that trigger depression, making it possible to increase the efficiency of depression treatment.

2016

pdf bib
Statistics-Based Lexical Choice for NLG from Quantitative Information
Xiao Li | Kees van Deemter | Chenghua Lin
Proceedings of the 9th International Natural Language Generation conference

2011

pdf bib
Sentence Subjectivity Detection with Weakly-Supervised Learning
Chenghua Lin | Yulan He | Richard Everson
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification
Yulan He | Chenghua Lin | Harith Alani
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
A Comparative Study of Bayesian Models for Unsupervised Sentiment Detection
Chenghua Lin | Yulan He | Richard Everson
Proceedings of the Fourteenth Conference on Computational Natural Language Learning