Fei Liu

UT Dallas, Bosch, CMU, University of Central Florida

Other people with similar names: Fei Liu (May refer to several people), Fei Liu (University of Melbourne)


2020

pdf bib
Understanding Points of Correspondence between Sentences for Abstractive Summarization
Logan Lebanoff | John Muchovej | Franck Dernoncourt | Doo Soon Kim | Lidan Wang | Walter Chang | Fei Liu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Fusing sentences containing disparate content is a remarkable human ability that helps create informative and succinct summaries. Such a simple task for humans has remained challenging for modern abstractive summarizers, substantially restricting their applicability in real-world scenarios. In this paper, we present an investigation into fusing sentences drawn from a document by introducing the notion of points of correspondence, which are cohesive devices that tie any two sentences together into a coherent text. The types of points of correspondence are delineated by text cohesion theory, covering pronominal and nominal referencing, repetition and beyond. We create a dataset containing the documents, source and fusion sentences, and human annotations of points of correspondence between sentences. Our dataset bridges the gap between coreference resolution and summarization. It is publicly shared to serve as a basis for future work to measure the success of sentence fusion systems.

pdf bib
How Domain Terminology Affects Meeting Summarization Performance
Jia Jin Koay | Alexander Roustai | Xiaojin Dai | Dillon Burns | Alec Kerrigan | Fei Liu
Proceedings of the 28th International Conference on Computational Linguistics

Meetings are essential to modern organizations. Numerous meetings are held and recorded daily, more than can ever be comprehended. A meeting summarization system that identifies salient utterances from the transcripts to automatically generate meeting minutes can help. It empowers users to rapidly search and sift through large meeting collections. To date, the impact of domain terminology on the performance of meeting summarization remains understudied, despite that meetings are rich with domain knowledge. In this paper, we create gold-standard annotations for domain terminology on a sizable meeting corpus; they are known as jargon terms. We then analyze the performance of a meeting summarization system with and without jargon terms. Our findings reveal that domain terminology can have a substantial impact on summarization performance. We publicly release all domain terminology to advance research in meeting summarization.

pdf bib
Learning to Fuse Sentences with Transformers for Summarization
Logan Lebanoff | Franck Dernoncourt | Doo Soon Kim | Lidan Wang | Walter Chang | Fei Liu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The ability to fuse sentences is highly attractive for summarization systems because it is an essential step to produce succinct abstracts. However, to date, summarizers can fail on fusing sentences. They tend to produce few summary sentences by fusion or generate incorrect fusions that lead the summary to fail to retain the original meaning. In this paper, we explore the ability of Transformers to fuse sentences and propose novel algorithms to enhance their ability to perform sentence fusion by leveraging the knowledge of points of correspondence between sentences. Through extensive experiments, we investigate the effects of different design choices on Transformer’s performance. Our findings highlight the importance of modeling points of correspondence between sentences for effective sentence fusion.

pdf bib
Better Highlighting: Creating Sub-Sentence Summary Highlights
Sangwoo Cho | Kaiqiang Song | Chen Li | Dong Yu | Hassan Foroosh | Fei Liu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Amongst the best means to summarize is highlighting. In this paper, we aim to generate summary highlights to be overlaid on the original documents to make it easier for readers to sift through a large amount of text. The method allows summaries to be understood in context to prevent a summarizer from distorting the original meaning, of which abstractive summarizers usually fall short. In particular, we present a new method to produce self-contained highlights that are understandable on their own to avoid confusion. Our method combines determinantal point processes and deep contextualized representations to identify an optimal set of sub-sentence segments that are both important and non-redundant to form summary highlights. To demonstrate the flexibility and modeling power of our method, we conduct extensive experiments on summarization datasets. Our analysis provides evidence that highlighting is a promising avenue of research towards future summarization.

2019

pdf bib
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
Wei Zhao | Maxime Peyrard | Fei Liu | Yang Gao | Christian M. Meyer | Steffen Eger
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

A robust evaluation metric has a profound impact on the development of text generation systems. A desirable metric compares system output against references based on their semantics rather than surface forms. In this paper we investigate strategies to encode system and reference texts to devise a metric that shows a high correlation with human judgment of text quality. We validate our new metric, namely MoverScore, on a number of text generation tasks including summarization, machine translation, image captioning, and data-to-text generation, where the outputs are produced by a variety of neural and non-neural systems. Our findings suggest that metrics combining contextualized representations with a distance measure perform the best. Such metrics also demonstrate strong generalization capability across tasks. For ease-of-use we make our metrics available as web service.

pdf bib
Proceedings of the 2nd Workshop on New Frontiers in Summarization
Lu Wang | Jackie Chi Kit Cheung | Giuseppe Carenini | Fei Liu
Proceedings of the 2nd Workshop on New Frontiers in Summarization

pdf bib
Towards Annotating and Creating Summary Highlights at Sub-sentence Level
Kristjan Arumae | Parminder Bhatia | Fei Liu
Proceedings of the 2nd Workshop on New Frontiers in Summarization

Highlighting is a powerful tool to pick out important content and emphasize. Creating summary highlights at the sub-sentence level is particularly desirable, because sub-sentences are more concise than whole sentences. They are also better suited than individual words and phrases that can potentially lead to disfluent, fragmented summaries. In this paper we seek to generate summary highlights by annotating summary-worthy sub-sentences and teaching classifiers to do the same. We frame the task as jointly selecting important sentences and identifying a single most informative textual unit from each sentence. This formulation dramatically reduces the task complexity involved in sentence compression. Our study provides new benchmarks and baselines for generating highlights at the sub-sentence level.

pdf bib
Multi-Document Summarization with Determinantal Point Processes and Contextualized Representations
Sangwoo Cho | Chen Li | Dong Yu | Hassan Foroosh | Fei Liu
Proceedings of the 2nd Workshop on New Frontiers in Summarization

Emerged as one of the best performing techniques for extractive summarization, determinantal point processes select a most probable set of summary sentences according to a probabilistic measure defined by respectively modeling sentence prominence and pairwise repulsion. Traditionally, both aspects are modelled using shallow and linguistically informed features, but the rise of deep contextualized representations raises an interesting question. Whether, and to what extent, could contextualized sentence representations be used to improve the DPP framework? Our findings suggest that, despite the success of deep semantic representations, it remains necessary to combine them with surface indicators for effective identification of summary-worthy sentences.

pdf bib
Analyzing Sentence Fusion in Abstractive Summarization
Logan Lebanoff | John Muchovej | Franck Dernoncourt | Doo Soon Kim | Seokhwan Kim | Walter Chang | Fei Liu
Proceedings of the 2nd Workshop on New Frontiers in Summarization

While recent work in abstractive summarization has resulted in higher scores in automatic metrics, there is little understanding on how these systems combine information taken from multiple document sentences. In this paper, we analyze the outputs of five state-of-the-art abstractive summarizers, focusing on summary sentences that are formed by sentence fusion. We ask assessors to judge the grammaticality, faithfulness, and method of fusion for summary sentences. Our analysis reveals that system sentences are mostly grammatical, but often fail to remain faithful to the original article.

pdf bib
Improving the Similarity Measure of Determinantal Point Processes for Extractive Multi-Document Summarization
Sangwoo Cho | Logan Lebanoff | Hassan Foroosh | Fei Liu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The most important obstacles facing multi-document summarization include excessive redundancy in source descriptions and the looming shortage of training data. These obstacles prevent encoder-decoder models from being used directly, but optimization-based methods such as determinantal point processes (DPPs) are known to handle them well. In this paper we seek to strengthen a DPP-based method for extractive multi-document summarization by presenting a novel similarity measure inspired by capsule networks. The approach measures redundancy between a pair of sentences based on surface form and semantic information. We show that our DPP system with improved similarity measure performs competitively, outperforming strong summarization baselines on benchmark datasets. Our findings are particularly meaningful for summarizing documents created by multiple authors containing redundant yet lexically diverse expressions.

pdf bib
Scoring Sentence Singletons and Pairs for Abstractive Summarization
Logan Lebanoff | Kaiqiang Song | Franck Dernoncourt | Doo Soon Kim | Seokhwan Kim | Walter Chang | Fei Liu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

When writing a summary, humans tend to choose content from one or two sentences and merge them into a single summary sentence. However, the mechanisms behind the selection of one or multiple source sentences remain poorly understood. Sentence fusion assumes multi-sentence input; yet sentence selection methods only work with single sentences and not combinations of them. There is thus a crucial gap between sentence selection and fusion to support summarizing by both compressing single sentences and fusing pairs. This paper attempts to bridge the gap by ranking sentence singletons and pairs together in a unified space. Our proposed framework attempts to model human methodology by selecting either a single sentence or a pair of sentences, then compressing or fusing the sentence(s) to produce a summary sentence. We conduct extensive experiments on both single- and multi-document summarization datasets and report findings on sentence selection and abstraction.

pdf bib
Guiding Extractive Summarization with Question-Answering Rewards
Kristjan Arumae | Fei Liu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Highlighting while reading is a natural behavior for people to track salient content of a document. It would be desirable to teach an extractive summarizer to do the same. However, a major obstacle to the development of a supervised summarizer is the lack of ground-truth. Manual annotation of extraction units is cost-prohibitive, whereas acquiring labels by automatically aligning human abstracts and source documents can yield inferior results. In this paper we describe a novel framework to guide a supervised, extractive summarization system with question-answering rewards. We argue that quality summaries should serve as document surrogates to answer important questions, and such question-answer pairs can be conveniently obtained from human abstracts. The system learns to promote summaries that are informative, fluent, and perform competitively on question-answering. Our results compare favorably with those reported by strong summarization baselines as evaluated by automatic metrics and human assessors.

2018

pdf bib
Automatic Detection of Vague Words and Sentences in Privacy Policies
Logan Lebanoff | Fei Liu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Website privacy policies represent the single most important source of information for users to gauge how their personal data are collected, used and shared by companies. However, privacy policies are often vague and people struggle to understand the content. Their opaqueness poses a significant challenge to both users and policy regulators. In this paper, we seek to identify vague content in privacy policies. We construct the first corpus of human-annotated vague words and sentences and present empirical studies on automatic vagueness detection. In particular, we investigate context-aware and context-agnostic models for predicting vague words, and explore auxiliary-classifier generative adversarial networks for characterizing sentence vagueness. Our experimental results demonstrate the effectiveness of proposed approaches. Finally, we provide suggestions for resolving vagueness and improving the usability of privacy policies.

pdf bib
Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization
Logan Lebanoff | Kaiqiang Song | Fei Liu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Generating a text abstract from a set of documents remains a challenging task. The neural encoder-decoder framework has recently been exploited to summarize single documents, but its success can in part be attributed to the availability of large parallel data automatically acquired from the Web. In contrast, parallel data for multi-document summarization are scarce and costly to obtain. There is a pressing need to adapt an encoder-decoder model trained on single-document summarization data to work with multiple-document input. In this paper, we present an initial investigation into a novel adaptation method. It exploits the maximal marginal relevance method to select representative sentences from multi-document input, and leverages an abstractive encoder-decoder model to fuse disparate sentences to an abstractive summary. The adaptation method is robust and itself requires no training data. Our system compares favorably to state-of-the-art extractive and abstractive approaches judged by automatic metrics and human assessors.

pdf bib
Abstract Meaning Representation for Multi-Document Summarization
Kexin Liao | Logan Lebanoff | Fei Liu
Proceedings of the 27th International Conference on Computational Linguistics

Generating an abstract from a collection of documents is a desirable capability for many real-world applications. However, abstractive approaches to multi-document summarization have not been thoroughly investigated. This paper studies the feasibility of using Abstract Meaning Representation (AMR), a semantic representation of natural language grounded in linguistic theory, as a form of content representation. Our approach condenses source documents to a set of summary graphs following the AMR formalism. The summary graphs are then transformed to a set of summary sentences in a surface realization step. The framework is fully data-driven and flexible. Each component can be optimized independently using small-scale, in-domain training data. We perform experiments on benchmark summarization datasets and report promising results. We also describe opportunities and challenges for advancing this line of research.

pdf bib
Structure-Infused Copy Mechanisms for Abstractive Summarization
Kaiqiang Song | Lin Zhao | Fei Liu
Proceedings of the 27th International Conference on Computational Linguistics

Seq2seq learning has produced promising results on summarization. However, in many cases, system summaries still struggle to keep the meaning of the original intact. They may miss out important words or relations that play critical roles in the syntactic structure of source sentences. In this paper, we present structure-infused copy mechanisms to facilitate copying important words and relations from the source sentence to summary sentence. The approach naturally combines source dependency structure with the copy mechanism of an abstractive sentence summarizer. Experimental results demonstrate the effectiveness of incorporating source-side syntactic information in the system, and our proposed approach compares favorably to state-of-the-art methods.

pdf bib
Reinforced Extractive Summarization with Question-Focused Rewards
Kristjan Arumae | Fei Liu
Proceedings of ACL 2018, Student Research Workshop

We investigate a new training paradigm for extractive summarization. Traditionally, human abstracts are used to derive goldstandard labels for extraction units. However, the labels are often inaccurate, because human abstracts and source documents cannot be easily aligned at the word level. In this paper we convert human abstracts to a set of Cloze-style comprehension questions. System summaries are encouraged to preserve salient source content useful for answering questions and share common words with the abstracts. We use reinforcement learning to explore the space of possible extractive summaries and introduce a question-focused reward function to promote concise, fluent, and informative summaries. Our experiments show that the proposed method is effective. It surpasses state-of-the-art systems on the standard summarization dataset.

pdf bib
Proceedings of ACL 2018, System Demonstrations
Fei Liu | Thamar Solorio
Proceedings of ACL 2018, System Demonstrations

2017

pdf bib
Capturing Long-range Contextual Dependencies with Memory-enhanced Conditional Random Fields
Fei Liu | Timothy Baldwin | Trevor Cohn
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Despite successful applications across a broad range of NLP tasks, conditional random fields (“CRFs”), in particular the linear-chain variant, are only able to model local features. While this has important benefits in terms of inference tractability, it limits the ability of the model to capture long-range dependencies between items. Attempts to extend CRFs to capture long-range dependencies have largely come at the cost of computational complexity and approximate inference. In this work, we propose an extension to CRFs by integrating external memory, taking inspiration from memory networks, thereby allowing CRFs to incorporate information far beyond neighbouring steps. Experiments across two tasks show substantial improvements over strong CRF and LSTM baselines.

pdf bib
Proceedings of the Workshop on New Frontiers in Summarization
Lu Wang | Jackie Chi Kit Cheung | Giuseppe Carenini | Fei Liu
Proceedings of the Workshop on New Frontiers in Summarization

2016

pdf bib
Automatic Summarization of Student Course Feedback
Wencan Luo | Fei Liu | Zitao Liu | Diane Litman
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
An Improved Phrase-based Approach to Annotating and Summarizing Student Course Responses
Wencan Luo | Fei Liu | Diane Litman
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Teaching large classes remains a great challenge, primarily because it is difficult to attend to all the student needs in a timely manner. Automatic text summarization systems can be leveraged to summarize the student feedback, submitted immediately after each lecture, but it is left to be discovered what makes a good summary for student responses. In this work we explore a new methodology that effectively extracts summary phrases from the student responses. Each phrase is tagged with the number of students who raise the issue. The phrases are evaluated along two dimensions: with respect to text content, they should be informative and well-formed, measured by the ROUGE metric; additionally, they shall attend to the most pressing student needs, measured by a newly proposed metric. This work is enabled by a phrase-based annotation and highlighting scheme, which is new to the summarization task. The phrase-based framework allows us to summarize the student responses into a set of bullet points and present to the instructor promptly.

2015

pdf bib
Extractive Summarization by Maximizing Semantic Volume
Dani Yogatama | Fei Liu | Noah A. Smith
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Toward Abstractive Summarization Using Semantic Representations
Fei Liu | Jeffrey Flanigan | Sam Thomson | Norman Sadeh | Noah A. Smith
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Unsupervised Alignment of Privacy Policies using Hidden Markov Models
Rohan Ramanath | Fei Liu | Norman Sadeh | Noah A. Smith
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
A Step Towards Usable Privacy Policy: Automatic Alignment of Privacy Statements
Fei Liu | Rohan Ramanath | Norman Sadeh | Noah A. Smith
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Improving Multi-documents Summarization by Sentence Compression based on Expanded Constituent Parse Trees
Chen Li | Yang Liu | Fei Liu | Lin Zhao | Fuliang Weng
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Document Summarization via Guided Sentence Compression
Chen Li | Fei Liu | Fuliang Weng | Yang Liu
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Participant-based Approach for Event Summarization Using Twitter Streams
Chao Shen | Fei Liu | Fuliang Weng | Tao Li
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
A Broad-Coverage Normalization System for Social Media Language
Fei Liu | Fuliang Weng | Xiao Jiang
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf bib
Why is “SXSW” trending? Exploring Multiple Text Sources for Twitter Topic Summarization
Fei Liu | Yang Liu | Fuliang Weng
Proceedings of the Workshop on Language in Social Media (LSM 2011)

pdf bib
Learning from Chinese-English Parallel Data for Chinese Tense Prediction
Feifan Liu | Fei Liu | Yang Liu
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Insertion, Deletion, or Substitution? Normalizing Text Messages without Pre-categorization nor Supervision
Fei Liu | Fuliang Weng | Bingqing Wang | Yang Liu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2009

pdf bib
From Extractive to Abstractive Meeting Summaries: Can It Be Done by Sentence Compression?
Fei Liu | Yang Liu
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib
Unsupervised Approaches for Automatic Keyword Extraction Using Meeting Transcripts
Feifan Liu | Deana Pennell | Fei Liu | Yang Liu
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2008

pdf bib
What Are Meeting Summaries? An Analysis of Human Extractive Summaries in Meeting Corpus
Fei Liu | Yang Liu
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue