Furu Wei


2020

pdf bib
Harvesting and Refining Question-Answer Pairs for Unsupervised QA
Zhongli Li | Wenhui Wang | Li Dong | Furu Wei | Ke Xu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Question Answering (QA) has shown great success thanks to the availability of large-scale datasets and the effectiveness of neural models. Recent research works have attempted to extend these successes to the settings with few or no labeled data available. In this work, we introduce two approaches to improve unsupervised QA. First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA). Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA. We conduct experiments on SQuAD 1.1, and NewsQA by fine-tuning BERT without access to manually annotated data. Our approach outperforms previous unsupervised approaches by a large margin, and is competitive with early supervised models. We also show the effectiveness of our approach in the few-shot learning setting.

pdf bib
TableBank: Table Benchmark for Image-based Table Detection and Recognition
Minghao Li | Lei Cui | Shaohan Huang | Furu Wei | Ming Zhou | Zhoujun Li
Proceedings of the 12th Language Resources and Evaluation Conference

We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet. Existing research for image-based table detection and recognition usually fine-tunes pre-trained models on out-of-domain data with a few thousand human-labeled examples, which is difficult to generalize on real-world applications. With TableBank that contains 417K high quality labeled tables, we build several strong baselines using state-of-the-art models with deep neural networks. We make TableBank publicly available and hope it will empower more deep learning approaches in the table detection and recognition task. The dataset and models can be downloaded from https://github.com/doc-analysis/TableBank.

pdf bib
Improving Grammatical Error Correction with Machine Translation Pairs
Wangchunshu Zhou | Tao Ge | Chang Mu | Ke Xu | Furu Wei | Ming Zhou
Findings of the Association for Computational Linguistics: EMNLP 2020

We propose a novel data synthesis method to generate diverse error-corrected sentence pairs for improving grammatical error correction, which is based on a pair of machine translation models (e.g., Chinese to English) of different qualities (i.e., poor and good). The poor translation model can resemble the ESL (English as a second language) learner and tends to generate translations of low quality in terms of fluency and grammaticality, while the good translation model generally generates fluent and grammatically correct translations. With the pair of translation models, we can generate unlimited numbers of poor to good English sentence pairs from text in the source language (e.g., Chinese) of the translators. Our approach can generate various error-corrected patterns and nicely complement the other data synthesis approaches for GEC. Experimental results demonstrate the data generated by our approach can effectively help a GEC model to improve the performance and achieve the state-of-the-art single-model performance in BEA-19 and CoNLL-14 benchmark datasets.

pdf bib
Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers
Shusheng Xu | Xingxing Zhang | Yi Wu | Furu Wei | Ming Zhou
Findings of the Association for Computational Linguistics: EMNLP 2020

Unsupervised extractive document summarization aims to select important sentences from a document without using labeled summaries during training. Existing methods are mostly graph-based with sentences as nodes and edge weights measured by sentence similarities. In this work, we find that transformer attentions can be used to rank sentences for unsupervised extractive summarization. Specifically, we first pre-train a hierarchical transformer model using unlabeled documents only. Then we propose a method to rank sentences using sentence-level self-attentions and pre-training objectives. Experiments on CNN/DailyMail and New York Times datasets show our model achieves state-of-the-art performance on unsupervised summarization. We also find in experiments that our model is less dependent on sentence positions. When using a linear combination of our model and a recent unsupervised model explicitly modeling sentence positions, we obtain even better results.

pdf bib
Scheduled DropHead: A Regularization Method for Transformer Models
Wangchunshu Zhou | Tao Ge | Furu Wei | Ming Zhou | Ke Xu
Findings of the Association for Computational Linguistics: EMNLP 2020

We introduce DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism which is a key component of transformer. In contrast to the conventional dropout mechanism which randomly drops units or connections, DropHead drops entire attention heads during training to prevent the multi-head attention model from being dominated by a small portion of attention heads. It can help reduce the risk of overfitting and allow the models to better benefit from the multi-head attention. Given the interaction between multi-headedness and training dynamics, we further propose a novel dropout rate scheduler to adjust the dropout rate of DropHead throughout training, which results in a better regularization effect. Experimental results demonstrate that our proposed approach can improve transformer models by 0.9 BLEU score on WMT14 En-De translation task and around 1.0 accuracy for various text classification tasks.

pdf bib
DocBank: A Benchmark Dataset for Document Layout Analysis
Minghao Li | Yiheng Xu | Lei Cui | Shaohan Huang | Furu Wei | Zhoujun Li | Ming Zhou
Proceedings of the 28th International Conference on Computational Linguistics

Document layout analysis usually relies on computer vision models to understand documents while ignoring textual information that is vital to capture. Meanwhile, high quality labeled datasets with both visual and textual information are still insufficient. In this paper, we present DocBank, a benchmark dataset that contains 500K document pages with fine-grained token-level annotations for document layout analysis. DocBank is constructed using a simple yet effective way with weak supervision from the LaTeX documents available on the arXiv.com. With DocBank, models from different modalities can be compared fairly and multi-modal approaches will be further investigated and boost the performance of document layout analysis. We build several strong baselines and manually split train/dev/test sets for evaluation. Experiment results show that models trained on DocBank accurately recognize the layout information for a variety of documents. The DocBank dataset is publicly available at https://github.com/doc-analysis/DocBank.

pdf bib
Unsupervised Fine-tuning for Text Clustering
Shaohan Huang | Furu Wei | Lei Cui | Xingxing Zhang | Ming Zhou
Proceedings of the 28th International Conference on Computational Linguistics

Fine-tuning with pre-trained language models (e.g. BERT) has achieved great success in many language understanding tasks in supervised settings (e.g. text classification). However, relatively little work has been focused on applying pre-trained models in unsupervised settings, such as text clustering. In this paper, we propose a novel method to fine-tune pre-trained models unsupervisedly for text clustering, which simultaneously learns text representations and cluster assignments using a clustering oriented loss. Experiments on three text clustering datasets (namely TREC-6, Yelp, and DBpedia) show that our model outperforms the baseline methods and achieves state-of-the-art results.

pdf bib
At Which Level Should We Extract? An Empirical Analysis on Extractive Document Summarization
Qingyu Zhou | Furu Wei | Ming Zhou
Proceedings of the 28th International Conference on Computational Linguistics

Extractive methods have been proven effective in automatic document summarization. Previous works perform this task by identifying informative contents at sentence level. However, it is unclear whether performing extraction at sentence level is the best solution. In this work, we show that unnecessity and redundancy issues exist when extracting full sentences, and extracting sub-sentential units is a promising alternative. Specifically, we propose extracting sub-sentential units based on the constituency parsing tree. A neural extractive model which leverages the sub-sentential information and extracts them is presented. Extensive experiments and analyses show that extracting sub-sentential units performs competitively comparing to full sentence extraction under the evaluation of both automatic and human evaluations. Hopefully, our work could provide some inspiration of the basic extraction units in extractive summarization for future research.

pdf bib
Language Generation with Multi-Hop Reasoning on Commonsense Knowledge Graph
Haozhe Ji | Pei Ke | Shaohan Huang | Furu Wei | Xiaoyan Zhu | Minlie Huang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Despite the success of generative pre-trained language models on a series of text generation tasks, they still suffer in cases where reasoning over underlying commonsense knowledge is required during generation. Existing approaches that integrate commonsense knowledge into generative pre-trained language models simply transfer relational knowledge by post-training on individual knowledge triples while ignoring rich connections within the knowledge graph. We argue that exploiting both the structural and semantic information of the knowledge graph facilitates commonsense-aware text generation. In this paper, we propose Generation with Multi-Hop Reasoning Flow (GRF) that enables pre-trained models with dynamic multi-hop reasoning on multi-relational paths extracted from the external commonsense knowledge graph. We empirically show that our model outperforms existing baselines on three text generation tasks that require reasoning over commonsense knowledge. We also demonstrate the effectiveness of the dynamic multi-hop reasoning module with reasoning paths inferred by the model that provide rationale to the generation.

pdf bib
Pre-training for Abstractive Document Summarization by Reinstating Source Text
Yanyan Zou | Xingxing Zhang | Wei Lu | Furu Wei | Ming Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Abstractive document summarization is usually modeled as a sequence-to-sequence (SEQ2SEQ) learning problem. Unfortunately, training large SEQ2SEQ based summarization models on limited supervised summarization data is challenging. This paper presents three sequence-to-sequence pre-training (in shorthand, STEP) objectives which allow us to pre-train a SEQ2SEQ based abstractive summarization model on unlabeled text. The main idea is that, given an input text artificially constructed from a document, a model is pre-trained to reinstate the original document. These objectives include sentence reordering, next sentence generation and masked document generation, which have close relations with the abstractive document summarization task. Experiments on two benchmark summarization datasets (i.e., CNN/DailyMail and New York Times) show that all three objectives can improve performance upon baselines. Compared to models pre-trained on large-scale data (larger than 160GB), our method, with only 19GB text for pre-training, achieves comparable results, which demonstrates its effectiveness.

pdf bib
Improving the Efficiency of Grammatical Error Correction with Erroneous Span Detection and Correction
Mengyun Chen | Tao Ge | Xingxing Zhang | Furu Wei | Ming Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection (ESD) and Erroneous Span Correction (ESC). ESD identifies grammatically incorrect text spans with an efficient sequence tagging model. Then, ESC leverages a seq2seq model to take the sentence with annotated erroneous spans as input and only outputs the corrected text for these spans. Experiments show our approach performs comparably to conventional seq2seq approaches in both English and Chinese GEC benchmarks with less than 50% time cost for inference.

pdf bib
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu | Wangchunshu Zhou | Tao Ge | Furu Wei | Ming Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, we propose a novel model compression approach to effectively compress BERT by progressive module replacing. Our approach first divides the original BERT into several modules and builds their compact substitutes. Then, we randomly replace the original modules with their substitutes to train the compact modules to mimic the behavior of the original modules. We progressively increase the probability of replacement through the training. In this way, our approach brings a deeper level of interaction between the original and compact models. Compared to the previous knowledge distillation approaches for BERT compression, our approach does not introduce any additional loss function. Our approach outperforms existing knowledge distillation approaches on GLUE benchmark, showing a new perspective of model compression.

2019

pdf bib
Video Dialog via Progressive Inference and Cross-Transformer
Weike Jin | Zhou Zhao | Mao Gu | Jun Xiao | Furu Wei | Yueting Zhuang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Video dialog is a new and challenging task, which requires the agent to answer questions combining video information with dialog history. And different from single-turn video question answering, the additional dialog history is important for video dialog, which often includes contextual information for the question. Existing visual dialog methods mainly use RNN to encode the dialog history as a single vector representation, which might be rough and straightforward. Some more advanced methods utilize hierarchical structure, attention and memory mechanisms, which still lack an explicit reasoning process. In this paper, we introduce a novel progressive inference mechanism for video dialog, which progressively updates query information based on dialog history and video content until the agent think the information is sufficient and unambiguous. In order to tackle the multi-modal fusion problem, we propose a cross-transformer module, which could learn more fine-grained and comprehensive interactions both inside and between the modalities. And besides answer generation, we also consider question generation, which is more challenging but significant for a complete video dialog system. We evaluate our method on two large-scale datasets, and the extensive experiments show the effectiveness of our method.

pdf bib
Visualizing and Understanding the Effectiveness of BERT
Yaru Hao | Li Dong | Furu Wei | Ke Xu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Language model pre-training, such as BERT, has achieved remarkable results in many NLP tasks. However, it is unclear why the pre-training-then-fine-tuning paradigm can improve performance and generalization capability across different tasks. In this paper, we propose to visualize loss landscapes and optimization trajectories of fine-tuning BERT on specific datasets. First, we find that pre-training reaches a good initial point across downstream tasks, which leads to wider optima and easier optimization compared with training from scratch. We also demonstrate that the fine-tuning procedure is robust to overfitting, even though BERT is highly over-parameterized for downstream tasks. Second, the visualization results indicate that fine-tuning BERT tends to generalize better because of the flat and wide optima, and the consistency between the training loss surface and the generalization error surface. Third, the lower layers of BERT are more invariant during fine-tuning, which suggests that the layers that are close to input learn more transferable representations of language.

pdf bib
Inspecting Unification of Encoding and Matching with Transformer: A Case Study of Machine Reading Comprehension
Hangbo Bao | Li Dong | Furu Wei | Wenhui Wang | Nan Yang | Lei Cui | Songhao Piao | Ming Zhou
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

Most machine reading comprehension (MRC) models separately handle encoding and matching with different network architectures. In contrast, pretrained language models with Transformer layers, such as GPT (Radford et al., 2018) and BERT (Devlin et al., 2018), have achieved competitive performance on MRC. A research question that naturally arises is: apart from the benefits of pre-training, how many performance gain comes from the unified network architecture. In this work, we evaluate and analyze unifying encoding and matching components with Transformer for the MRC task. Experimental results on SQuAD show that the unified model outperforms previous networks that separately treat encoding and matching. We also introduce a metric to inspect whether a Transformer layer tends to perform encoding or matching. The analysis results show that the unified model learns different modeling strategies compared with previous manually-designed models.

pdf bib
BERT-based Lexical Substitution
Wangchunshu Zhou | Tao Ge | Ke Xu | Furu Wei | Ming Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Previous studies on lexical substitution tend to obtain substitute candidates by finding the target word’s synonyms from lexical resources (e.g., WordNet) and then rank the candidates based on its contexts. These approaches have two limitations: (1) They are likely to overlook good substitute candidates that are not the synonyms of the target words in the lexical resources; (2) They fail to take into account the substitution’s influence on the global context of the sentence. To address these issues, we propose an end-to-end BERT-based lexical substitution approach which can propose and validate substitute candidates without using any annotated data or manually curated resources. Our approach first applies dropout to the target word’s embedding for partially masking the word, allowing BERT to take balanced consideration of the target word’s semantics and contexts for proposing substitute candidates, and then validates the candidates based on their substitution’s influence on the global contextualized representation of the sentence. Experiments show our approach performs well in both proposing and ranking substitute candidates, achieving the state-of-the-art results in both LS07 and LS14 benchmarks.

pdf bib
Retrieval-Enhanced Adversarial Training for Neural Response Generation
Qingfu Zhu | Lei Cui | Wei-Nan Zhang | Furu Wei | Ting Liu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Dialogue systems are usually built on either generation-based or retrieval-based approaches, yet they do not benefit from the advantages of different models. In this paper, we propose a Retrieval-Enhanced Adversarial Training (REAT) method for neural response generation. Distinct from existing approaches, the REAT method leverages an encoder-decoder framework in terms of an adversarial training paradigm, while taking advantage of N-best response candidates from a retrieval-based system to construct the discriminator. An empirical study on a large scale public available benchmark dataset shows that the REAT method significantly outperforms the vanilla Seq2Seq model as well as the conventional adversarial training approach.

pdf bib
Learning to Ask Unanswerable Questions for Machine Reading Comprehension
Haichao Zhu | Li Dong | Furu Wei | Wenhui Wang | Bing Qin | Ting Liu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Machine reading comprehension with unanswerable questions is a challenging task. In this work, we propose a data augmentation technique by automatically generating relevant unanswerable questions according to an answerable question paired with its corresponding paragraph that contains the answer. We introduce a pair-to-sequence model for unanswerable question generation, which effectively captures the interactions between the question and the paragraph. We also present a way to construct training data for our question generation models by leveraging the existing reading comprehension dataset. Experimental results show that the pair-to-sequence model performs consistently better compared with the sequence-to-sequence baseline. We further use the automatically generated unanswerable questions as a means of data augmentation on the SQuAD 2.0 dataset, yielding 1.9 absolute F1 improvement with BERT-base model and 1.7 absolute F1 improvement with BERT-large model.

pdf bib
HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization
Xingxing Zhang | Furu Wei | Ming Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Neural extractive summarization models usually employ a hierarchical encoder for document encoding and they are trained using sentence-level labels, which are created heuristically using rule-based methods. Training the hierarchical encoder with these inaccurate labels is challenging. Inspired by the recent work on pre-training transformer sentence encoders (Devlin et al., 2018), we propose Hibert (as shorthand for HIerachical Bidirectional Encoder Representations from Transformers) for document encoding and a method to pre-train it using unlabeled data. We apply the pre-trained Hibert to our summarization model and it outperforms its randomly initialized counterpart by 1.25 ROUGE on the CNN/Dailymail dataset and by 2.0 ROUGE on a version of New York Times dataset. We also achieve the state-of-the-art performance on these two datasets.

pdf bib
Automatic Grammatical Error Correction for Sequence-to-sequence Text Generation: An Empirical Study
Tao Ge | Xingxing Zhang | Furu Wei | Ming Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Sequence-to-sequence (seq2seq) models have achieved tremendous success in text generation tasks. However, there is no guarantee that they can always generate sentences without grammatical errors. In this paper, we present a preliminary empirical study on whether and how much automatic grammatical error correction can help improve seq2seq text generation. We conduct experiments across various seq2seq text generation tasks including machine translation, formality style transfer, sentence compression and simplification. Experiments show the state-of-the-art grammatical error correction system can improve the grammaticality of generated text and can bring task-oriented improvements in the tasks where target sentences are in a formal style.

2018

pdf bib
EventWiki: A Knowledge Base of Major Events
Tao Ge | Lei Cui | Baobao Chang | Zhifang Sui | Furu Wei | Ming Zhou
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Neural Latent Extractive Document Summarization
Xingxing Zhang | Mirella Lapata | Furu Wei | Ming Zhou
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Extractive summarization models need sentence level labels, which are usually created with rule-based methods since most summarization datasets only have document summary pairs. These labels might be suboptimal. We propose a latent variable extractive model, where sentences are viewed as latent variables and sentences with activated variables are used to infer gold summaries. During training, the loss can come directly from gold summaries. Experiments on CNN/Dailymail dataset show our latent extractive model outperforms a strong extractive baseline trained on rule-based labels and also performs competitively with several recent models.

pdf bib
Attention-Guided Answer Distillation for Machine Reading Comprehension
Minghao Hu | Yuxing Peng | Furu Wei | Zhen Huang | Dongsheng Li | Nan Yang | Ming Zhou
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Despite that current reading comprehension systems have achieved significant advancements, their promising performances are often obtained at the cost of making an ensemble of numerous models. Besides, existing approaches are also vulnerable to adversarial attacks. This paper tackles these problems by leveraging knowledge distillation, which aims to transfer knowledge from an ensemble model to a single model. We first demonstrate that vanilla knowledge distillation applied to answer span prediction is effective for reading comprehension systems. We then propose two novel approaches that not only penalize the prediction on confusing answers but also guide the training with alignment information distilled from the ensemble. Experiments show that our best student model has only a slight drop of 0.4% F1 on the SQuAD test set compared to the ensemble teacher, while running 12x faster during inference. It even outperforms the teacher on adversarial SQuAD datasets and NarrativeQA benchmark.

pdf bib
Fine-grained Coordinated Cross-lingual Text Stream Alignment for Endless Language Knowledge Acquisition
Tao Ge | Qing Dou | Heng Ji | Lei Cui | Baobao Chang | Zhifang Sui | Furu Wei | Ming Zhou
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper proposes to study fine-grained coordinated cross-lingual text stream alignment through a novel information network decipherment paradigm. We use Burst Information Networks as media to represent text streams and present a simple yet effective network decipherment algorithm with diverse clues to decipher the networks for accurate text stream alignment. Experiments on Chinese-English news streams show our approach not only outperforms previous approaches on bilingual lexicon extraction from coordinated text streams but also can harvest high-quality alignments from large amounts of streaming data for endless language knowledge mining, which makes it promising to be a new paradigm for automatic language knowledge acquisition.

pdf bib
Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization
Ziqiang Cao | Wenjie Li | Sujian Li | Furu Wei
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Most previous seq2seq summarization systems purely depend on the source text to generate summaries, which tends to work unstably. Inspired by the traditional template-based summarization approaches, this paper proposes to use existing summaries as soft templates to guide the seq2seq model. To this end, we use a popular IR platform to Retrieve proper summaries as candidate templates. Then, we extend the seq2seq framework to jointly conduct template Reranking and template-aware summary generation (Rewriting). Experiments show that, in terms of informativeness, our model significantly outperforms the state-of-the-art methods, and even soft templates themselves demonstrate high competitiveness. In addition, the import of high-quality external summaries improves the stability and readability of generated summaries.

pdf bib
Neural Document Summarization by Jointly Learning to Score and Select Sentences
Qingyu Zhou | Nan Yang | Furu Wei | Shaohan Huang | Ming Zhou | Tiejun Zhao
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Sentence scoring and sentence selection are two main steps in extractive document summarization systems. However, previous works treat them as two separated subtasks. In this paper, we present a novel end-to-end neural network framework for extractive document summarization by jointly learning to score and select sentences. It first reads the document sentences with a hierarchical encoder to obtain the representation of sentences. Then it builds the output summary by extracting sentences one by one. Different from previous methods, our approach integrates the selection strategy into the scoring model, which directly predicts the relative importance given previously selected sentences. Experiments on the CNN/Daily Mail dataset show that the proposed framework significantly outperforms the state-of-the-art extractive summarization models.

pdf bib
Fluency Boost Learning and Inference for Neural Grammatical Error Correction
Tao Ge | Furu Wei | Ming Zhou
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Most of the neural sequence-to-sequence (seq2seq) models for grammatical error correction (GEC) have two limitations: (1) a seq2seq model may not be well generalized with only limited error-corrected data; (2) a seq2seq model may fail to completely correct a sentence with multiple errors through normal seq2seq inference. We attempt to address these limitations by proposing a fluency boost learning and inference mechanism. Fluency boosting learning generates fluency-boost sentence pairs during training, enabling the error correction model to learn how to improve a sentence’s fluency from more instances, while fluency boosting inference allows the model to correct a sentence incrementally with multiple inference steps until the sentence’s fluency stops increasing. Experiments show our approaches improve the performance of seq2seq models for GEC, achieving state-of-the-art results on both CoNLL-2014 and JFLEG benchmark datasets.

pdf bib
Neural Open Information Extraction
Lei Cui | Furu Wei | Ming Zhou
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Conventional Open Information Extraction (Open IE) systems are usually built on hand-crafted patterns from other NLP tools such as syntactic parsing, yet they face problems of error propagation. In this paper, we propose a neural Open IE approach with an encoder-decoder framework. Distinct from existing methods, the neural Open IE approach learns highly confident arguments and relation tuples bootstrapped from a state-of-the-art Open IE system. An empirical study on a large benchmark dataset shows that the neural Open IE system significantly outperforms several baselines, while maintaining comparable computational efficiency.

2017

pdf bib
Gated Self-Matching Networks for Reading Comprehension and Question Answering
Wenhui Wang | Nan Yang | Furu Wei | Baobao Chang | Ming Zhou
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we present the gated self-matching networks for reading comprehension style question answering, which aims to answer questions from a given passage. We first match the question and passage with gated attention-based recurrent networks to obtain the question-aware passage representation. Then we propose a self-matching attention mechanism to refine the representation by matching the passage against itself, which effectively encodes information from the whole passage. We finally employ the pointer networks to locate the positions of answers from the passages. We conduct extensive experiments on the SQuAD dataset. The single model achieves 71.3% on the evaluation metrics of exact match on the hidden test set, while the ensemble model further boosts the results to 75.9%. At the time of submission of the paper, our model holds the first place on the SQuAD leaderboard for both single and ensemble model.

pdf bib
Selective Encoding for Abstractive Sentence Summarization
Qingyu Zhou | Nan Yang | Furu Wei | Ming Zhou
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose a selective encoding model to extend the sequence-to-sequence framework for abstractive sentence summarization. It consists of a sentence encoder, a selective gate network, and an attention equipped decoder. The sentence encoder and decoder are built with recurrent neural networks. The selective gate network constructs a second level sentence representation by controlling the information flow from encoder to decoder. The second level representation is tailored for sentence summarization task, which leads to better performance. We evaluate our model on the English Gigaword, DUC 2004 and MSR abstractive sentence summarization datasets. The experimental results show that the proposed selective encoding model outperforms the state-of-the-art baseline models.

pdf bib
SuperAgent: A Customer Service Chatbot for E-commerce Websites
Lei Cui | Shaohan Huang | Furu Wei | Chuanqi Tan | Chaoqun Duan | Ming Zhou
Proceedings of ACL 2017, System Demonstrations

pdf bib
Learning to Generate Product Reviews from Attributes
Li Dong | Shaohan Huang | Furu Wei | Mirella Lapata | Ming Zhou | Ke Xu
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Automatically generating product reviews is a meaningful, yet not well-studied task in sentiment analysis. Traditional natural language generation methods rely extensively on hand-crafted rules and predefined templates. This paper presents an attention-enhanced attribute-to-sequence model to generate product reviews for given attribute information, such as user, product, and rating. The attribute encoder learns to represent input attributes as vectors. Then, the sequence decoder generates reviews by conditioning its output on these vectors. We also introduce an attention mechanism to jointly generate reviews and align words with input attributes. The proposed model is trained end-to-end to maximize the likelihood of target product reviews given the attributes. We build a publicly available dataset for the review generation task by leveraging the Amazon book reviews and their metadata. Experiments on the dataset show that our approach outperforms baseline methods and the attention mechanism significantly improves the performance of our model.

pdf bib
Entity Linking for Queries by Searching Wikipedia Sentences
Chuanqi Tan | Furu Wei | Pengjie Ren | Weifeng Lv | Ming Zhou
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present a simple yet effective approach for linking entities in queries. The key idea is to search sentences similar to a query from Wikipedia articles and directly use the human-annotated entities in the similar sentences as candidate entities for the query. Then, we employ a rich set of features, such as link-probability, context-matching, word embeddings, and relatedness among candidate entities as well as their related entities, to rank the candidates under a regression based framework. The advantages of our approach lie in two aspects, which contribute to the ranking process and final linking result. First, it can greatly reduce the number of candidate entities by filtering out irrelevant entities with the words in the query. Second, we can obtain the query sensitive prior probability in addition to the static link-probability derived from all Wikipedia articles. We conduct experiments on two benchmark datasets on entity linking for queries, namely the ERD14 dataset and the GERDAQ dataset. Experimental results show that our method outperforms state-of-the-art systems and yields 75.0% in F1 on the ERD14 dataset and 56.9% on the GERDAQ dataset.

2016

pdf bib
Solving and Generating Chinese Character Riddles
Chuanqi Tan | Furu Wei | Li Dong | Weifeng Lv | Ming Zhou
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Redundancy-Aware Sentence Regression Framework for Extractive Summarization
Pengjie Ren | Furu Wei | Zhumin Chen | Jun Ma | Ming Zhou
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Existing sentence regression methods for extractive summarization usually model sentence importance and redundancy in two separate processes. They first evaluate the importance f(s) of each sentence s and then select sentences to generate a summary based on both the importance scores and redundancy among sentences. In this paper, we propose to model importance and redundancy simultaneously by directly evaluating the relative importance f(s|S) of a sentence s given a set of selected sentences S. Specifically, we present a new framework to conduct regression with respect to the relative gain of s given S calculated by the ROUGE metric. Besides the single sentence features, additional features derived from the sentence relations are incorporated. Experiments on the DUC 2001, 2002 and 2004 multi-document summarization datasets show that the proposed method outperforms state-of-the-art extractive summarization approaches.

pdf bib
AttSum: Joint Learning of Focusing and Summarization with Neural Attention
Ziqiang Cao | Wenjie Li | Sujian Li | Furu Wei | Yanran Li
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Query relevance ranking and sentence saliency ranking are the two main tasks in extractive query-focused summarization. Previous supervised summarization systems often perform the two tasks in isolation. However, since reference summaries are the trade-off between relevance and saliency, using them as supervision, neither of the two rankers could be trained well. This paper proposes a novel summarization system called AttSum, which tackles the two tasks jointly. It automatically learns distributed representations for sentences as well as the document cluster. Meanwhile, it applies the attention mechanism to simulate the attentive reading of human behavior when a query is given. Extensive experiments are conducted on DUC query-focused summarization benchmark datasets. Without using any hand-crafted features, AttSum achieves competitive performance. We also observe that the sentences recognized to focus on the query indeed meet the query need.

2015

pdf bib
Cross-lingual Sentiment Lexicon Learning With Bilingual Word Graph Label Propagation
Dehong Gao | Furu Wei | Wenjie Li | Xiaohua Liu | Ming Zhou
Computational Linguistics, Volume 41, Issue 1 - March 2015

pdf bib
A Statistical Parsing Framework for Sentiment Classification
Li Dong | Furu Wei | Shujie Liu | Ming Zhou | Ke Xu
Computational Linguistics, Volume 41, Issue 2 - June 2015

pdf bib
Question Answering over Freebase with Multi-Column Convolutional Neural Networks
Li Dong | Furu Wei | Ming Zhou | Ke Xu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
A Dependency-Based Neural Network for Relation Classification
Yang Liu | Furu Wei | Sujian Li | Heng Ji | Ming Zhou | Houfeng Wang
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Learning Summary Prior Representation for Extractive Summarization
Ziqiang Cao | Furu Wei | Sujian Li | Wenjie Li | Ming Zhou | Houfeng Wang
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Splusplus: A Feature-Rich Two-stage Classifier for Sentiment Analysis of Tweets
Li Dong | Furu Wei | Yichun Yin | Ming Zhou | Ke Xu
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
Coooolll: A Deep Learning System for Twitter Sentiment Classification
Duyu Tang | Furu Wei | Bing Qin | Ting Liu | Ming Zhou
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
Duyu Tang | Furu Wei | Nan Yang | Ming Zhou | Ting Liu | Bing Qin
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification
Li Dong | Furu Wei | Chuanqi Tan | Duyu Tang | Ming Zhou | Ke Xu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Building Large-Scale Twitter-Specific Sentiment Lexicon : A Representation Learning Approach
Duyu Tang | Furu Wei | Bing Qin | Ming Zhou | Ting Liu
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
A Joint Segmentation and Classification Framework for Sentiment Analysis
Duyu Tang | Furu Wei | Bing Qin | Li Dong | Ting Liu | Ming Zhou
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Entity Linking for Tweets
Xiaohua Liu | Yitong Li | Haocheng Wu | Ming Zhou | Furu Wei | Yi Lu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Joint Inference of Named Entity Recognition and Normalization for Tweets
Xiaohua Liu | Ming Zhou | Xiangyang Zhou | Zhongyang Fu | Furu Wei
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Cross-Lingual Mixture Model for Sentiment Classification
Xinfan Meng | Furu Wei | Xiaohua Liu | Ming Zhou | Ge Xu | Houfeng Wang
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
QuickView: NLP-based Tweet Search
Xiaohua Liu | Furu Wei | Ming Zhou | QuickView Team Microsoft
Proceedings of the ACL 2012 System Demonstrations

pdf bib
Twitter Topic Summarization by Ranking Tweets using Social Influence and Content Quality
Yajuan Duan | Zhumin Chen | Furu Wei | Ming Zhou | Heung-Yeung Shum
Proceedings of COLING 2012

pdf bib
Graph-Based Multi-Tweet Summarization using Social Signals
Xiaohua Liu | Yitong Li | Furu Wei | Ming Zhou
Proceedings of COLING 2012

pdf bib
Lost in Translations? Building Sentiment Lexicons using Context Based Machine Translation
Xinfan Meng | Furu Wei | Ge Xu | Longkai Zhang | Xiaohua Liu | Ming Zhou | Houfeng Wang
Proceedings of COLING 2012: Posters

2011

pdf bib
Recognizing Named Entities in Tweets
Xiaohua Liu | Shaodian Zhang | Furu Wei | Ming Zhou
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2009

pdf bib
Co-Feedback Ranking for Query-Focused Summarization
Furu Wei | Wenjie Li | Yanxiang He
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

pdf bib
A Novel Feature-based Approach to Chinese Entity Relation Extraction
Wenjie Li | Peng Zhang | Furu Wei | Yuexian Hou | Qin Lu
Proceedings of ACL-08: HLT, Short Papers

pdf bib
Exploiting the Role of Position Feature in Chinese Relation Extraction
Peng Zhang | Wenjie Li | Furu Wei | Qin Lu | Yuexian Hou
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Relation extraction is the task of finding pre-defined semantic relations between two entities or entity mentions from text. Many methods, such as feature-based and kernel-based methods, have been proposed in the literature. Among them, feature-based methods draw much attention from researchers. However, to the best of our knowledge, existing feature-based methods did not explicitly incorporate the position feature and no in-depth analysis was conducted in this regard. In this paper, we define and exploit nine types of position information between two named entity mentions and then use it along with other features in a multi-class classification framework for Chinese relation extraction. Experiments on the ACE 2005 data set show that the position feature is more effective than the other recognized features like entity type/subtype and character-based N-gram context. Most important, it can be easily captured and does not require as much effort as applying deep natural language processing.

pdf bib
PNR2: Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization
Wenjie Li | Furu Wei | Qin Lu | Yanxiang He
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)