Danqi Chen


2020

pdf bib
Open-Domain Question Answering
Danqi Chen | Wen-tau Yih
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

This tutorial provides a comprehensive and coherent overview of cutting-edge research in open-domain question answering (QA), the task of answering questions using a large collection of documents of diversified topics. We will start by first giving a brief historical background, discussing the basic setup and core technical challenges of the research problem, and then describe modern datasets with the common evaluation metrics and benchmarks. The focus will then shift to cutting-edge models proposed for open-domain QA, including two-stage retriever-reader approaches, dense retriever and end-to-end training, and retriever-free methods. Finally, we will cover some hybrid approaches using both text and large knowledge bases and conclude the tutorial with important open questions. We hope that the tutorial will not only help the audience to acquire up-to-date knowledge but also provide new perspectives to stimulate the advances of open-domain QA research in the next phase.

pdf bib
TextHide: Tackling Data Privacy in Language Understanding Tasks
Yangsibo Huang | Zhao Song | Danqi Chen | Kai Li | Sanjeev Arora
Findings of the Association for Computational Linguistics: EMNLP 2020

An unsolved challenge in distributed or federated learning is to effectively mitigate privacy risks without slowing down training or reducing accuracy. In this paper, we propose TextHide aiming at addressing this challenge for natural language understanding tasks. It requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data. Such an encryption step is efficient and only affects the task performance slightly. In addition, TextHide fits well with the popular framework of fine-tuning pre-trained language models (e.g., BERT) for any sentence or sentence-pair task. We evaluate TextHide on the GLUE benchmark, and our experiments show that TextHide can effectively defend attacks on shared gradients or representations and the averaged accuracy reduction is only 1.9%. We also present an analysis of the security of TextHide using a conjecture about the computational intractability of a mathematical problem.

pdf bib
Dense Passage Retrieval for Open-Domain Question Answering
Vladimir Karpukhin | Barlas Oguz | Sewon Min | Patrick Lewis | Ledell Wu | Sergey Edunov | Danqi Chen | Wen-tau Yih
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system greatly by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks.

pdf bib
SpanBERT: Improving Pre-training by Representing and Predicting Spans
Mandar Joshi | Danqi Chen | Yinhan Liu | Daniel S. Weld | Luke Zettlemoyer | Omer Levy
Transactions of the Association for Computational Linguistics, Volume 8

We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERTlarge, our single model obtains 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0 respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6% F1), strong performance on the TACRED relation extraction benchmark, and even gains on GLUE.1

2019

pdf bib
CoQA: A Conversational Question Answering Challenge
Siva Reddy | Danqi Chen | Christopher D. Manning
Transactions of the Association for Computational Linguistics, Volume 7

Humans gather information through conversations involving a series of interconnected questions and answers. For machines to assist in information gathering, it is therefore essential to enable them to answer conversational questions. We introduce CoQA, a novel dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets (e.g., coreference and pragmatic reasoning). We evaluate strong dialogue and reading comprehension models on CoQA. The best system obtains an F1 score of 65.4%, which is 23.4 points behind human performance (88.8%), indicating that there is ample room for improvement. We present CoQA as a challenge to the community at https://stanfordnlp.github.io/coqa.

pdf bib
A Discrete Hard EM Approach for Weakly Supervised Question Answering
Sewon Min | Danqi Chen | Hannaneh Hajishirzi | Luke Zettlemoyer
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Many question answering (QA) tasks only provide weak supervision for how the answer should be computed. For example, TriviaQA answers are entities that can be mentioned multiple times in supporting documents, while DROP answers can be computed by deriving many different equations from numbers in the reference text. In this paper, we show it is possible to convert such tasks into discrete latent variable learning problems with a precomputed, task-specific set of possible solutions (e.g. different mentions or equations) that contains one correct option. We then develop a hard EM learning scheme that computes gradients relative to the most likely solution at each update. Despite its simplicity, we show that this approach significantly outperforms previous methods on six QA tasks, including absolute gains of 2–10%, and achieves the state-of-the-art on five of them. Using hard updates instead of maximizing marginal likelihood is key to these results as it encourages the model to find the one correct answer, which we show through detailed qualitative analysis.

pdf bib
Proceedings of the 2nd Workshop on Machine Reading for Question Answering
Adam Fisch | Alon Talmor | Robin Jia | Minjoon Seo | Eunsol Choi | Danqi Chen
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

pdf bib
MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension
Adam Fisch | Alon Talmor | Robin Jia | Minjoon Seo | Eunsol Choi | Danqi Chen
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

We present the results of the Machine Reading for Question Answering (MRQA) 2019 shared task on evaluating the generalization capabilities of reading comprehension systems. In this task, we adapted and unified 18 distinct question answering datasets into the same format. Among them, six datasets were made available for training, six datasets were made available for development, and the rest were hidden for final evaluation. Ten teams submitted systems, which explored various ideas including data sampling, multi-task learning, adversarial training and ensembling. The best system achieved an average F1 score of 72.5 on the 12 held-out datasets, 10.7 absolute points higher than our initial baseline based on BERT.

2018

pdf bib
Proceedings of the Workshop on Machine Reading for Question Answering
Eunsol Choi | Minjoon Seo | Danqi Chen | Robin Jia | Jonathan Berant
Proceedings of the Workshop on Machine Reading for Question Answering

2017

pdf bib
Reading Wikipedia to Answer Open-Domain Questions
Danqi Chen | Adam Fisch | Jason Weston | Antoine Bordes
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper proposes to tackle open-domain question answering using Wikipedia as the unique knowledge source: the answer to any factoid question is a text span in a Wikipedia article. This task of machine reading at scale combines the challenges of document retrieval (finding the relevant articles) with that of machine comprehension of text (identifying the answer spans from those articles). Our approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA datasets indicate that (1) both modules are highly competitive with respect to existing counterparts and (2) multitask learning using distant supervision on their combination is an effective complete system on this challenging task.

pdf bib
Position-aware Attention and Supervised Data Improve Slot Filling
Yuhao Zhang | Victor Zhong | Danqi Chen | Gabor Angeli | Christopher D. Manning
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Organized relational knowledge in the form of “knowledge graphs” is important for many applications. However, the ability to populate knowledge bases with facts automatically extracted from documents has improved frustratingly slowly. This paper simultaneously addresses two issues that have held back prior work. We first propose an effective new model, which combines an LSTM sequence model with a form of entity position-aware attention that is better suited to relation extraction. Then we build TACRED, a large (119,474 examples) supervised relation extraction dataset obtained via crowdsourcing and targeted towards TAC KBP relations. The combination of better supervised data and a more appropriate high-capacity model enables much better relation extraction performance. When the model trained on this new dataset replaces the previous relation extraction component of the best TAC KBP 2015 slot filling system, its F1 score increases markedly from 22.2% to 26.7%.

2016

pdf bib
A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task
Danqi Chen | Jason Bolton | Christopher D. Manning
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the 5th Workshop on Automated Knowledge Base Construction
Jay Pujara | Tim Rocktaschel | Danqi Chen | Sameer Singh
Proceedings of the 5th Workshop on Automated Knowledge Base Construction

2015

pdf bib
Representing Text for Joint Embedding of Text and Knowledge Bases
Kristina Toutanova | Danqi Chen | Patrick Pantel | Hoifung Poon | Pallavi Choudhury | Michael Gamon
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Observed versus latent features for knowledge base and text inference
Kristina Toutanova | Danqi Chen
Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality

2014

pdf bib
A Fast and Accurate Dependency Parser using Neural Networks
Danqi Chen | Christopher Manning
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)