Li Zhang


pdf bib
SmartCiteCon: Implicit Citation Context Extraction from Academic Literature Using Supervised Learning
Chenrui Guo | Haoran Cui | Li Zhang | Jiamin Wang | Wei Lu | Jian Wu
Proceedings of the 8th International Workshop on Mining Scientific Publications

We introduce SmartCiteCon (SCC), a Java API for extracting both explicit and implicit citation context from academic literature in English. The tool is built on a Support Vector Machine (SVM) model trained on a set of 7,058 manually annotated citation context sentences, curated from 34,000 papers from the ACL Anthology. The model with 19 features achieves F1=85.6%. SCC supports PDF, XML, and JSON files out-of-box, provided that they are conformed to certain schemas. The API supports single document processing and batch processing in parallel. It takes about 12–45 seconds on average depending on the format to process a document on a dedicated server with 6 multithreaded cores. Using SCC, we extracted 11.8 million citation context sentences from ~33.3k PMC papers in the CORD-19 dataset, released on June 13, 2020. We will provide continuous supplementary data contribution to the CORD-19 and other datasets. The source code is released at

pdf bib
Small but Mighty: New Benchmarks for Split and Rephrase
Li Zhang | Huaiyu Zhu | Siddhartha Brahma | Yunyao Li
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Split and Rephrase is a text simplification task of rewriting a complex sentence into simpler ones. As a relatively new task, it is paramount to ensure the soundness of its evaluation benchmark and metric. We find that the widely used benchmark dataset universally contains easily exploitable syntactic cues caused by its automatic generation process. Taking advantage of such cues, we show that even a simple rule-based model can perform on par with the state-of-the-art model. To remedy such limitations, we collect and release two crowdsourced benchmark datasets. We not only make sure that they contain significantly more diverse syntax, but also carefully control for their quality according to a well-defined set of criteria. While no satisfactory automatic metric exists, we apply fine-grained manual evaluation based on these criteria using crowdsourcing, showing that our datasets better represent the task and are significantly more challenging for the models.

pdf bib
Reasoning about Goals, Steps, and Temporal Ordering with WikiHow
Li Zhang | Qing Lyu | Chris Callison-Burch
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose a suite of reasoning tasks on two types of relations between procedural events: goal-step relations (“learn poses” is a step in the larger goal of “doing yoga”) and step-step temporal relations (“buy a yoga mat” typically precedes “learn poses”). We introduce a dataset targeting these two relations based on wikiHow, a website of instructional how-to articles. Our human-validated test set serves as a reliable benchmark for common-sense inference, with a gap of about 10% to 20% between the performance of state-of-the-art transformer models and human performance. Our automatically-generated training set allows models to effectively transfer to out-of-domain tasks requiring knowledge of procedural events, with greatly improved performances on SWAG, Snips, and Story Cloze Test in zero- and few-shot settings.


pdf bib
CAN: Constrained Attention Networks for Multi-Aspect Sentiment Analysis
Mengting Hu | Shiwan Zhao | Li Zhang | Keke Cai | Zhong Su | Renhong Cheng | Xiaowei Shen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Aspect level sentiment classification is a fine-grained sentiment analysis task. To detect the sentiment towards a particular aspect in a sentence, previous studies have developed various attention-based methods for generating aspect-specific sentence representations. However, the attention may inherently introduce noise and downgrade the performance. In this paper, we propose constrained attention networks (CAN), a simple yet effective solution, to regularize the attention for multi-aspect sentiment analysis, which alleviates the drawback of the attention mechanism. Specifically, we introduce orthogonal regularization on multiple aspects and sparse regularization on each single aspect. Experimental results on two public datasets demonstrate the effectiveness of our approach. We further extend our approach to multi-task settings and outperform the state-of-the-art methods.

pdf bib
Multi-Label Transfer Learning for Multi-Relational Semantic Similarity
Li Zhang | Steven Wilson | Rada Mihalcea
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Multi-relational semantic similarity datasets define the semantic relations between two short texts in multiple ways, e.g., similarity, relatedness, and so on. Yet, all the systems to date designed to capture such relations target one relation at a time. We propose a multi-label transfer learning approach based on LSTM to make predictions for several relations simultaneously and aggregate the losses to update the parameters. This multi-label regression approach jointly learns the information provided by the multiple relations, rather than treating them as separate tasks. Not only does this approach outperform the single-task approach and the traditional multi-task learning approach, but it also achieves state-of-the-art performance on all but one relation of the Human Activity Phrase dataset.


pdf bib
Improving Text-to-SQL Evaluation Methodology
Catherine Finegan-Dollak | Jonathan K. Kummerfeld | Li Zhang | Karthik Ramanathan | Sesh Sadasivam | Rui Zhang | Dragomir Radev
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

To be informative, an evaluation must measure how well systems generalize to realistic unseen data. We identify limitations of and propose improvements to current evaluations of text-to-SQL systems. First, we compare human-generated and automatically generated questions, characterizing properties of queries necessary for real-world applications. To facilitate evaluation on multiple datasets, we release standardized and improved versions of seven existing datasets and one new text-to-SQL dataset. Second, we show that the current division of data into training and test sets measures robustness to variations in the way questions are asked, but only partially tests how well systems generalize to new queries; therefore, we propose a complementary dataset split for evaluation of future work. Finally, we demonstrate how the common practice of anonymizing variables during evaluation removes an important challenge of the task. Our observations highlight key difficulties, and our methodology enables effective measurement of future development.


pdf bib
Affect Detection from Semantic Interpretation of Drama Improvisation
Li Zhang | Ming Jiang
Proceedings of COLING 2012: Posters


pdf bib
Metaphor Interpretation and Context-based Affect Detection
Li Zhang
Coling 2010: Posters


pdf bib
Don’t worry about metaphor: affect detection for conversational agents
Catherine Smith | Timothy Rumbell | John Barnden | Robert Hendley | Mark Lee | Alan Wallington | Li Zhang
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions


pdf bib
Exploitation in Affect Detection in Open-Ended Improvisational Text
Li Zhang | John A. Barnden | Robert J. Hendley | Alan M. Wallington
Proceedings of the Workshop on Sentiment and Subjectivity in Text

pdf bib
Empirical Study on the Performance Stability of Named Entity Recognition Model across Domains
Hong Lei Guo | Li Zhang | Zhong Su
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib
Developments in Affect Detection in E-drama
Li Zhang | John A. Barnden | Robert J. Hendley | Alan M. Wallington