Lin Li


pdf bib
Chinese Long and Short Form Choice Exploiting Neural Network Language Modeling Approaches
Lin Li | Kees van Deemter | Denis Paperno
Proceedings of the 19th Chinese National Conference on Computational Linguistics

This paper presents our work in long and short form choice, a significant question of lexical choice, which plays an important role in many Natural Language Understanding tasks. Long and short form sharing at least one identical word meaning but with different number of syllables is a highly frequent linguistic phenomenon in Chinese like 老虎-虎(laohu-hu, tiger)

pdf bib
Gradations of Error Severity in Automatic Image Descriptions
Emiel van Miltenburg | Wei-Ting Lu | Emiel Krahmer | Albert Gatt | Guanyi Chen | Lin Li | Kees van Deemter
Proceedings of the 13th International Conference on Natural Language Generation

Earlier research has shown that evaluation metrics based on textual similarity (e.g., BLEU, CIDEr, Meteor) do not correlate well with human evaluation scores for automatically generated text. We carried out an experiment with Chinese speakers, where we systematically manipulated image descriptions to contain different kinds of errors. Because our manipulated descriptions form minimal pairs with the reference descriptions, we are able to assess the impact of different kinds of errors on the perceived quality of the descriptions. Our results show that different kinds of errors elicit significantly different evaluation scores, even though all erroneous descriptions differ in only one character from the reference descriptions. Evaluation metrics based solely on textual similarity are unable to capture these differences, which (at least partially) explains their poor correlation with human judgments. Our work provides the foundations for future work, where we aim to understand why different errors are seen as more or less severe.

pdf bib
Manifold Learning-based Word Representation Refinement Incorporating Global and Local Information
Wenyu Zhao | Dong Zhou | Lin Li | Jinjun Chen
Proceedings of the 28th International Conference on Computational Linguistics

Recent studies show that word embedding models often underestimate similarities between similar words and overestimate similarities between distant words. This results in word similarity results obtained from embedding models inconsistent with human judgment. Manifold learning-based methods are widely utilized to refine word representations by re-embedding word vectors from the original embedding space to a new refined semantic space. These methods mainly focus on preserving local geometry information through performing weighted locally linear combination between words and their neighbors twice. However, these reconstruction weights are easily influenced by different selections of neighboring words and the whole combination process is time-consuming. In this paper, we propose two novel word representation refinement methods leveraging isometry feature mapping and local tangent space respectively. Unlike previous methods, our first method corrects pre-trained word embeddings by preserving global geometry information of all words instead of local geometry information between words and their neighbors. Our second method refines word representations by aligning original and re-fined embedding spaces based on local tangent space instead of performing weighted locally linear combination twice. Experimental results obtained from standard semantic relatedness and semantic similarity tasks show that our methods outperform various state-of-the-art baselines for word representation refinement.

pdf bib
Conversational Semantic Parsing for Dialog State Tracking
Jianpeng Cheng | Devang Agrawal | Héctor Martínez Alonso | Shruti Bhargava | Joris Driesen | Federico Flego | Dain Kaplan | Dimitri Kartsaklis | Lin Li | Dhivya Piraviperumal | Jason D. Williams | Hong Yu | Diarmuid Ó Séaghdha | Anders Johannsen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We consider a new perspective on dialog state tracking (DST), the task of estimating a user’s goal through the course of a dialog. By formulating DST as a semantic parsing task over hierarchical representations, we can incorporate semantic compositionality, cross-domain knowledge sharing and co-reference. We present TreeDST, a dataset of 27k conversations annotated with tree-structured dialog states and system acts. We describe an encoder-decoder framework for DST with hierarchical representations, which leads to ~20% improvement over state-of-the-art DST approaches that operate on a flat meaning space of slot-value pairs.


pdf bib
Choosing between Long and Short Word Forms in Mandarin
Lin Li | Kees van Deemter | Denis Paperno | Jingyu Fan
Proceedings of the 12th International Conference on Natural Language Generation

Between 80% and 90% of all Chinese words have long and short form such as 老虎/虎 (lao-hu/hu , tiger) (Duanmu:2013). Consequently, the choice between long and short forms is a key problem for lexical choice across NLP and NLG. Following an earlier work on abbreviations in English (Mahowald et al, 2013), we bring a probabilistic perspective to these questions, using both a behavioral and a corpus-based approach. We hypothesized that there is a higher probability of choosing short form in supportive context than in neutral context in Mandarin. Consistent with our prediction, our findings revealed that predictability of contexts makes effect on speakers’ long and short form choice.