Ming Li


pdf bib
Two Birds, One Stone: A Simple, Unified Model for Text Generation from Structured and Unstructured Data
Hamidreza Shahidi | Ming Li | Jimmy Lin
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

A number of researchers have recently questioned the necessity of increasingly complex neural network (NN) architectures. In particular, several recent papers have shown that simpler, properly tuned models are at least competitive across several NLP tasks. In this work, we show that this is also the case for text generation from structured and unstructured data. We consider neural table-to-text generation and neural question generation (NQG) tasks for text generation from structured and unstructured data, respectively. Table-to-text generation aims to generate a description based on a given table, and NQG is the task of generating a question from a given passage where the generated question can be answered by a certain sub-span of the passage using NN models. Experimental results demonstrate that a basic attention-based seq2seq model trained with the exponential moving average technique achieves the state of the art in both tasks. Code is available at https://github.com/h-shahidi/2birds-gen.


pdf bib
End-to-End Neural Context Reconstruction in Chinese Dialogue
Wei Yang | Rui Qiao | Haocheng Qin | Amy Sun | Luchen Tan | Kun Xiong | Ming Li
Proceedings of the First Workshop on NLP for Conversational AI

We tackle the problem of context reconstruction in Chinese dialogue, where the task is to replace pronouns, zero pronouns, and other referring expressions with their referent nouns so that sentences can be processed in isolation without context. Following a standard decomposition of the context reconstruction task into referring expression detection and coreference resolution, we propose a novel end-to-end architecture for separately and jointly accomplishing this task. Key features of this model include POS and position encoding using CNNs and a novel pronoun masking mechanism. One perennial problem in building such models is the paucity of training data, which we address by augmenting previously-proposed methods to generate a large amount of realistic training data. The combination of more data and better models yields accuracy higher than the state-of-the-art method in coreference resolution and end-to-end context reconstruction.

pdf bib
Detecting Customer Complaint Escalation with Recurrent Neural Networks and Manually-Engineered Features
Wei Yang | Luchen Tan | Chunwei Lu | Anqi Cui | Han Li | Xi Chen | Kun Xiong | Muzi Wang | Ming Li | Jian Pei | Jimmy Lin
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)

Consumers dissatisfied with the normal dispute resolution process provided by an e-commerce company’s customer service agents have the option of escalating their complaints by filing grievances with a government authority. This paper tackles the challenge of monitoring ongoing text chat dialogues to identify cases where the customer expresses such an intent, providing triage and prioritization for a separate pool of specialized agents specially trained to handle more complex situations. We describe a hybrid model that tackles this challenge by integrating recurrent neural networks with manually-engineered features. Experiments show that both components are complementary and contribute to overall recall, outperforming competitive baselines. A trial online deployment of our model demonstrates its business value in improving customer service.

pdf bib
End-to-End Open-Domain Question Answering with BERTserini
Wei Yang | Yuqing Xie | Aileen Lin | Xingyu Li | Luchen Tan | Kun Xiong | Ming Li | Jimmy Lin
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

We demonstrate an end-to-end question answering system that integrates BERT with the open-source Anserini information retrieval toolkit. In contrast to most question answering and reading comprehension models today, which operate over small amounts of input text, our system integrates best practices from IR with a BERT-based reader to identify answers from a large corpus of Wikipedia articles in an end-to-end fashion. We report large improvements over previous results on a standard benchmark test collection, showing that fine-tuning pretrained BERT with SQuAD is sufficient to achieve high accuracy in identifying answer spans.


pdf bib
Entity Disambiguation by Knowledge and Text Jointly Embedding
Wei Fang | Jianwen Zhang | Dilin Wang | Zheng Chen | Ming Li
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning


pdf bib
Measuring the Non-compositionality of Multiword Expressions
Fan Bu | Xiaoyan Zhu | Ming Li
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)


pdf bib
Discovering Patterns to Extract Protein-Protein Interactions from Full Biomedical Texts
Minlie Huang | Xiaoyan Zhu | Donald G. Payan | Kunbin Qu | Ming Li
Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)