Eunsol Choi


2020

pdf bib
Entities as Experts: Sparse Memory Access with Entity Supervision
Thibault Févry | Livio Baldini Soares | Nicholas FitzGerald | Eunsol Choi | Tom Kwiatkowski
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We focus on the problem of capturing declarative knowledge about entities in the learned parameters of a language model. We introduce a new model—Entities as Experts (EaE)—that can access distinct memories of the entities mentioned in a piece of text. Unlike previous efforts to integrate entity knowledge into sequence models, EaE’s entity representations are learned directly from text. We show that EaE’s learned representations capture sufficient knowledge to answer TriviaQA questions such as “Which Dr. Who villain has been played by Roger Delgado, Anthony Ainley, Eric Roberts?”, outperforming an encoder-generator Transformer model with 10x the parameters on this task. According to the Lama knowledge probes, EaE contains more factual knowledge than a similar sized Bert, as well as previous approaches that integrate external sources of entity knowledge.Because EaE associates parameters with specific entities, it only needs to access a fraction of its parameters at inference time, and we show that the correct identification and representation of entities is essential to EaE’s performance.

pdf bib
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
Jonathan H. Clark | Eunsol Choi | Michael Collins | Dan Garrette | Tom Kwiatkowski | Vitaly Nikolaev | Jennimaria Palomaki
Transactions of the Association for Computational Linguistics, Volume 8

Confidently making progress on multilingual modeling requires challenging, trustworthy evaluations. We present TyDi QA—a question answering dataset covering 11 typologically diverse languages with 204K question-answer pairs. The languages of TyDi QA are diverse with regard to their typology—the set of linguistic features each language expresses—such that we expect models performing well on this set to generalize across a large number of the world’s languages. We present a quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena that would not be found in English-only corpora. To provide a realistic information-seeking task and avoid priming effects, questions are written by people who want to know the answer, but don’t know the answer yet, and the data is collected directly in each language without the use of translation.

2019

pdf bib
Proceedings of the 2nd Workshop on Machine Reading for Question Answering
Adam Fisch | Alon Talmor | Robin Jia | Minjoon Seo | Eunsol Choi | Danqi Chen
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

pdf bib
MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension
Adam Fisch | Alon Talmor | Robin Jia | Minjoon Seo | Eunsol Choi | Danqi Chen
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

We present the results of the Machine Reading for Question Answering (MRQA) 2019 shared task on evaluating the generalization capabilities of reading comprehension systems. In this task, we adapted and unified 18 distinct question answering datasets into the same format. Among them, six datasets were made available for training, six datasets were made available for development, and the rest were hidden for final evaluation. Ten teams submitted systems, which explored various ideas including data sampling, multi-task learning, adversarial training and ensembling. The best system achieved an average F1 score of 72.5 on the 12 held-out datasets, 10.7 absolute points higher than our initial baseline based on BERT.

pdf bib
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Fernando Alva-Manchego | Eunsol Choi | Daniel Khashabi
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

pdf bib
No Permanent Friends or Enemies: Tracking Relationships between Nations from News
Xiaochuang Han | Eunsol Choi | Chenhao Tan
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Understanding the dynamics of international politics is important yet challenging for civilians. In this work, we explore unsupervised neural models to infer relations between nations from news articles. We extend existing models by incorporating shallow linguistics information and propose a new automatic evaluation metric that aligns relationship dynamics with manually annotated key events. As understanding international relations requires carefully analyzing complex relationships, we conduct in-person human evaluations with three groups of participants. Overall, humans prefer the outputs of our model and give insightful feedback that suggests future directions for human-centered models. Furthermore, our model reveals interesting regional differences in news coverage. For instance, with respect to US-China relations, Singaporean media focus more on “strengthening” and “purchasing”, while US media focus more on “criticizing” and “denouncing”.

pdf bib
pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
Mandar Joshi | Eunsol Choi | Omer Levy | Daniel Weld | Luke Zettlemoyer
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Reasoning about implied relationships (e.g. paraphrastic, common sense, encyclopedic) between pairs of words is crucial for many cross-sentence inference problems. This paper proposes new methods for learning and using embeddings of word pairs that implicitly represent background knowledge about such relationships. Our pairwise embeddings are computed as a compositional function of each word’s representation, which is learned by maximizing the pointwise mutual information (PMI) with the contexts in which the the two words co-occur. We add these representations to the cross-sentence attention layer of existing inference models (e.g. BiDAF for QA, ESIM for NLI), instead of extending or replacing existing word embeddings. Experiments show a gain of 2.7% on the recently released SQuAD 2.0 and 1.3% on MultiNLI. Our representations also aid in better generalization with gains of around 6-7% on adversarial SQuAD datasets, and 8.8% on the adversarial entailment test set by Glockner et al. (2018).

2018

pdf bib
Neural Metaphor Detection in Context
Ge Gao | Eunsol Choi | Yejin Choi | Luke Zettlemoyer
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We present end-to-end neural models for detecting metaphorical word use in context. We show that relatively standard BiLSTM models which operate on complete sentences work well in this setting, in comparison to previous work that used more restricted forms of linguistic context. These models establish a new state-of-the-art on existing verb metaphor detection benchmarks, and show strong performance on jointly predicting the metaphoricity of all words in a running text.

pdf bib
QuAC: Question Answering in Context
Eunsol Choi | He He | Mohit Iyyer | Mark Yatskar | Wen-tau Yih | Yejin Choi | Percy Liang | Luke Zettlemoyer
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We present QuAC, a dataset for Question Answering in Context that contains 14K information-seeking QA dialogs (100K questions in total). The dialogs involve two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts from the text. QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context, as we show in a detailed qualitative evaluation. We also report results for a number of reference models, including a recently state-of-the-art reading comprehension architecture extended to model dialog context. Our best model underperforms humans by 20 F1, suggesting that there is significant room for future work on this data. Dataset, baseline, and leaderboard available at http://quac.ai.

pdf bib
Ultra-Fine Entity Typing
Eunsol Choi | Omer Levy | Yejin Choi | Luke Zettlemoyer
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We introduce a new entity typing task: given a sentence with an entity mention, the goal is to predict a set of free-form phrases (e.g. skyscraper, songwriter, or criminal) that describe appropriate types for the target entity. This formulation allows us to use a new type of distant supervision at large scale: head words, which indicate the type of the noun phrases they appear in. We show that these ultra-fine types can be crowd-sourced, and introduce new evaluation sets that are much more diverse and fine-grained than existing benchmarks. We present a model that can predict ultra-fine types, and is trained using a multitask objective that pools our new head-word supervision with prior supervision from entity linking. Experimental results demonstrate that our model is effective in predicting entity types at varying granularity; it achieves state of the art performance on an existing fine-grained entity typing benchmark, and sets baselines for our newly-introduced datasets.

pdf bib
Proceedings of the Workshop on Machine Reading for Question Answering
Eunsol Choi | Minjoon Seo | Danqi Chen | Robin Jia | Jonathan Berant
Proceedings of the Workshop on Machine Reading for Question Answering

2017

pdf bib
Coarse-to-Fine Question Answering for Long Documents
Eunsol Choi | Daniel Hewlett | Jakob Uszkoreit | Illia Polosukhin | Alexandre Lacoste | Jonathan Berant
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a framework for question answering that can efficiently scale to longer documents while maintaining or even improving performance of state-of-the-art models. While most successful approaches for reading comprehension rely on recurrent neural networks (RNNs), running them over long documents is prohibitively slow because it is difficult to parallelize over sequences. Inspired by how people first skim the document, identify relevant parts, and carefully read these parts to produce an answer, we combine a coarse, fast model for selecting relevant sentences and a more expensive RNN for producing the answer from those sentences. We treat sentence selection as a latent variable trained jointly from the answer only using reinforcement learning. Experiments demonstrate state-of-the-art performance on a challenging subset of the WikiReading dataset and on a new dataset, while speeding up the model by 3.5x-6.7x.

pdf bib
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi | Eunsol Choi | Daniel Weld | Luke Zettlemoyer
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions. We show that, in comparison to other recently introduced large-scale datasets, TriviaQA (1) has relatively complex, compositional questions, (2) has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and (3) requires more cross sentence reasoning to find answers. We also present two baseline algorithms: a feature-based classifier and a state-of-the-art neural network, that performs well on SQuAD reading comprehension. Neither approach comes close to human performance (23% and 40% vs. 80%), suggesting that TriviaQA is a challenging testbed that is worth significant future study.

pdf bib
Zero-Shot Relation Extraction via Reading Comprehension
Omer Levy | Minjoon Seo | Eunsol Choi | Luke Zettlemoyer
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

We show that relation extraction can be reduced to answering simple reading comprehension questions, by associating one or more natural-language questions with each relation slot. This reduction has several advantages: we can (1) learn relation-extraction models by extending recent neural reading-comprehension techniques, (2) build very large training sets for those models by combining relation-specific crowd-sourced questions with distant supervision, and even (3) do zero-shot learning by extracting new relation types that are only specified at test-time, for which we have no labeled training examples. Experiments on a Wikipedia slot-filling task demonstrate that the approach can generalize to new questions for known relation types with high accuracy, and that zero-shot generalization to unseen relation types is possible, at lower accuracy levels, setting the bar for future work on this task.

pdf bib
Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking
Hannah Rashkin | Eunsol Choi | Jin Yea Jang | Svitlana Volkova | Yejin Choi
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present an analytic study on the language of news media in the context of political fact-checking and fake news detection. We compare the language of real news with that of satire, hoaxes, and propaganda to find linguistic characteristics of untrustworthy text. To probe the feasibility of automatic political fact-checking, we also present a case study based on PolitiFact.com using their factuality judgments on a 6-point scale. Experiments show that while media fact-checking remains to be an open research question, stylistic cues can help determine the truthfulness of text.

2016

pdf bib
Extracting Structured Scholarly Information from the Machine Translation Literature
Eunsol Choi | Matic Horvat | Jonathan May | Kevin Knight | Daniel Marcu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Understanding the experimental results of a scientific paper is crucial to understanding its contribution and to comparing it with related work. We introduce a structured, queryable representation for experimental results and a baseline system that automatically populates this representation. The representation can answer compositional questions such as: “Which are the best published results reported on the NIST 09 Chinese to English dataset?” and “What are the most important methods for speeding up phrase-based decoding?” Answering such questions usually involves lengthy literature surveys. Current machine reading for academic papers does not usually consider the actual experiments, but mostly focuses on understanding abstracts. We describe annotation work to create an initial hscientific paper; experimental results representationi corpus. The corpus is composed of 67 papers which were manually annotated with a structured representation of experimental results by domain experts. Additionally, we present a baseline algorithm that characterizes the difficulty of the inference task.

pdf bib
Proceedings of the NAACL Student Research Workshop
Jacob Andreas | Eunsol Choi | Angeliki Lazaridou
Proceedings of the NAACL Student Research Workshop

pdf bib
Document-level Sentiment Inference with Social, Faction, and Discourse Context
Eunsol Choi | Hannah Rashkin | Luke Zettlemoyer | Yejin Choi
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Scalable Semantic Parsing with Partial Ontologies
Eunsol Choi | Tom Kwiatkowski | Luke Zettlemoyer
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2013

pdf bib
Scaling Semantic Parsers with On-the-Fly Ontology Matching
Tom Kwiatkowski | Eunsol Choi | Yoav Artzi | Luke Zettlemoyer
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2012

pdf bib
Hedge Detection as a Lens on Framing in the GMO Debates: A Position Paper
Eunsol Choi | Chenhao Tan | Lillian Lee | Cristian Danescu-Niculescu-Mizil | Jennifer Spindel
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics