Chandra Bhagavatula


2020

pdf bib
Generative Data Augmentation for Commonsense Reasoning
Yiben Yang | Chaitanya Malaviya | Jared Fernandez | Swabha Swayamdipta | Ronan Le Bras | Ji-Ping Wang | Chandra Bhagavatula | Yejin Choi | Doug Downey
Findings of the Association for Computational Linguistics: EMNLP 2020

Recent advances in commonsense reasoning depend on large-scale human-annotated training sets to achieve peak performance. However, manual curation of training sets is expensive and has been shown to introduce annotation artifacts that neural models can readily exploit and overfit to. We propose a novel generative data augmentation technique, G-DAUGˆC, that aims to achieve more accurate and robust learning in a low-resource setting. Our approach generates synthetic examples using pretrained language models and selects the most informative and diverse set of examples for data augmentation. On experiments with multiple commonsense reasoning benchmarks, G-DAUGˆC consistently outperforms existing data augmentation methods based on back-translation, establishing a new state-of-the-art on WinoGrande, CODAH, and CommonsenseQA, as well as enhances out-of-distribution generalization, proving to be robust against adversaries or perturbations. Our analysis demonstrates that G-DAUGˆC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.

pdf bib
CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning
Bill Yuchen Lin | Wangchunshu Zhou | Ming Shen | Pei Zhou | Chandra Bhagavatula | Yejin Choi | Xiang Ren
Findings of the Association for Computational Linguistics: EMNLP 2020

Recently, large-scale pre-trained language models have demonstrated impressive performance on several commonsense-reasoning benchmark datasets. However, building machines with commonsense to compose realistically plausible sentences remains challenging. In this paper, we present a constrained text generation task, CommonGen associated with a benchmark dataset, to explicitly test machines for the ability of generative commonsense reasoning. Given a set of common concepts (e.g., dog, frisbee, catch, throw); the task is to generate a coherent sentence describing an everyday scenario using these concepts (e.g., “a man throws a frisbee and his dog catches it”). The CommonGen task is challenging because it inherently requires 1) relational reasoning with background commonsense knowledge and 2) compositional generalization ability to work on unseen concept combinations. Our dataset, constructed through a combination of crowdsourced and existing caption corpora, consists of 77k commonsense descriptions over 35k unique concept-sets. Experiments show that there is a large gap between state-of-the-art text generation models (e.g., T5) and human performance (31.6% v.s. 63.5% in SPICE metric). Furthermore, we demonstrate that the learned generative commonsense reasoning capability can be transferred to improve downstream tasks such as CommonsenseQA (76.9% to 78.4 in dev accuracy) by generating additional context.

pdf bib
Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs
Ana Marasović | Chandra Bhagavatula | Jae sung Park | Ronan Le Bras | Noah A. Smith | Yejin Choi
Findings of the Association for Computational Linguistics: EMNLP 2020

Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level explanations based on gradients or attention weights. We present the first study focused on generating natural language rationales across several complex visual reasoning tasks: visual commonsense reasoning, visual-textual entailment, and visual question answering. The key challenge of accurate rationalization is comprehensive image understanding at all levels: not just their explicit content at the pixel level, but their contextual contents at the semantic and pragmatic levels. We present RationaleˆVT Transformer, an integrated model that learns to generate free-text rationales by combining pretrained language models with object recognition, grounded visual semantic frames, and visual commonsense graphs. Our experiments show that free-text rationalization is a promising research direction to complement model interpretability for complex visual-textual reasoning tasks. In addition, we find that integration of richer semantic and pragmatic visual features improves visual fidelity of rationales.

pdf bib
Thinking Like a Skeptic: Defeasible Inference in Natural Language
Rachel Rudinger | Vered Shwartz | Jena D. Hwang | Chandra Bhagavatula | Maxwell Forbes | Ronan Le Bras | Noah A. Smith | Yejin Choi
Findings of the Association for Computational Linguistics: EMNLP 2020

Defeasible inference is a mode of reasoning in which an inference (X is a bird, therefore X flies) may be weakened or overturned in light of new evidence (X is a penguin). Though long recognized in classical AI and philosophy, defeasible inference has not been extensively studied in the context of contemporary data-driven research on natural language inference and commonsense reasoning. We introduce Defeasible NLI (abbreviated 𝛿-NLI), a dataset for defeasible inference in natural language. Defeasible NLI contains extensions to three existing inference datasets covering diverse modes of reasoning: common sense, natural language inference, and social norms. From Defeasible NLI, we develop both a classification and generation task for defeasible inference, and demonstrate that the generation task is much more challenging. Despite lagging human performance, however, generative models trained on this data are capable of writing sentences that weaken or strengthen a specified inference up to 68% of the time.

pdf bib
Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning
Lianhui Qin | Vered Shwartz | Peter West | Chandra Bhagavatula | Jena D. Hwang | Ronan Le Bras | Antoine Bosselut | Yejin Choi
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Abductive and counterfactual reasoning, core abilities of everyday human cognition, require reasoning about what might have happened at time t, while conditioning on multiple contexts from the relative past and future. However, simultaneous incorporation of past and future contexts using generative language models (LMs) can be challenging, as they are trained either to condition only on the past context or to perform narrowly scoped text-infilling. In this paper, we propose DeLorean, a new unsupervised decoding algorithm that can flexibly incorporate both the past and future contexts using only off-the-shelf, left-to-right language models and no supervision. The key intuition of our algorithm is incorporating the future through back-propagation, during which, we only update the internal representation of the output while fixing the model parameters. By alternating between forward and backward propagation, DeLorean can decode the output representation that reflects both the left and right contexts. We demonstrate that our approach is general and applicable to two nonmonotonic reasoning tasks: abductive text generation and counterfactual story revision, where DeLorean outperforms a range of unsupervised and some supervised methods, based on automatic and human evaluation.

pdf bib
Unsupervised Commonsense Question Answering with Self-Talk
Vered Shwartz | Peter West | Ronan Le Bras | Chandra Bhagavatula | Yejin Choi
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Natural language understanding involves reading between the lines with implicit background knowledge. Current systems either rely on pre-trained language models as the sole implicit source of world knowledge, or resort to external knowledge bases (KBs) to incorporate additional relevant knowledge. We propose an unsupervised framework based on self-talk as a novel alternative to multiple-choice commonsense tasks. Inspired by inquiry-based discovery learning (Bruner, 1961), our approach inquires language models with a number of information seeking questions such as “what is the definition of...” to discover additional background knowledge. Empirical results demonstrate that the self-talk procedure substantially improves the performance of zero-shot language model baselines on four out of six commonsense benchmarks, and competes with models that obtain knowledge from external KBs. While our approach improves performance on several benchmarks, the self-talk induced knowledge even when leading to correct answers is not always seen as helpful by human judges, raising interesting questions about the inner-workings of pre-trained language models for commonsense reasoning.

2019

pdf bib
Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning
Lifu Huang | Ronan Le Bras | Chandra Bhagavatula | Yejin Choi
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Understanding narratives requires reading between the lines, which in turn, requires interpreting the likely causes and effects of events, even when they are not mentioned explicitly. In this paper, we introduce Cosmos QA, a large-scale dataset of 35,600 problems that require commonsense-based reading comprehension, formulated as multiple-choice questions. In stark contrast to most existing reading comprehension datasets where the questions focus on factual and literal understanding of the context paragraph, our dataset focuses on reading between the lines over a diverse collection of people’s everyday narratives, asking such questions as “what might be the possible reason of ...?", or “what would have happened if ..." that require reasoning beyond the exact text spans in the context. To establish baseline performances on Cosmos QA, we experiment with several state-of-the-art neural architectures for reading comprehension, and also propose a new architecture that improves over the competitive baselines. Experimental results demonstrate a significant gap between machine (68.4%) and human performance (94%), pointing to avenues for future research on commonsense machine comprehension. Dataset, code and leaderboard is publicly available at https://wilburone.github.io/cosmos.

pdf bib
Counterfactual Story Reasoning and Generation
Lianhui Qin | Antoine Bosselut | Ari Holtzman | Chandra Bhagavatula | Elizabeth Clark | Yejin Choi
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Counterfactual reasoning requires predicting how alternative events, contrary to what actually happened, might have resulted in different outcomes. Despite being considered a necessary component of AI-complete systems, few resources have been developed for evaluating counterfactual reasoning in narratives. In this paper, we propose Counterfactual Story Rewriting: given an original story and an intervening counterfactual event, the task is to minimally revise the story to make it compatible with the given counterfactual event. Solving this task will require deep understanding of causal narrative chains and counterfactual invariance, and integration of such story reasoning capabilities into conditional language generation models. We present TIMETRAVEL, a new dataset of 29,849 counterfactual rewritings, each with the original story, a counterfactual event, and human-generated revision of the original story compatible with the counterfactual event. Additionally, we include 81,407 counterfactual “branches” without a rewritten storyline to support future work on semi- or un-supervised approaches to counterfactual story rewriting. Finally, we evaluate the counterfactual rewriting capacities of several competitive baselines based on pretrained language models, and assess whether common overlap and model-based automatic metrics for text generation correlate well with human scores for counterfactual rewriting.

2018

pdf bib
Content-Based Citation Recommendation
Chandra Bhagavatula | Sergey Feldman | Russell Power | Waleed Ammar
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We present a content-based method for recommending citations in an academic paper draft. We embed a given query document into a vector space, then use its nearest neighbors as candidates, and rerank the candidates using a discriminative model trained to distinguish between observed and unobserved citations. Unlike previous work, our method does not require metadata such as author names which can be missing, e.g., during the peer review process. Without using metadata, our method outperforms the best reported results on PubMed and DBLP datasets with relative improvements of over 18% in F1@20 and over 22% in MRR. We show empirically that, although adding metadata improves the performance on standard metrics, it favors self-citations which are less useful in a citation recommendation setup. We release an online portal for citation recommendation based on our method, (URL: http://bit.ly/citeDemo) and a new dataset OpenCorpus of 7 million research articles to facilitate future research on this task.

pdf bib
Construction of the Literature Graph in Semantic Scholar
Waleed Ammar | Dirk Groeneveld | Chandra Bhagavatula | Iz Beltagy | Miles Crawford | Doug Downey | Jason Dunkelberger | Ahmed Elgohary | Sergey Feldman | Vu Ha | Rodney Kinney | Sebastian Kohlmeier | Kyle Lo | Tyler Murray | Hsu-Han Ooi | Matthew Peters | Joanna Power | Sam Skjonsberg | Lucy Wang | Chris Wilhelm | Zheng Yuan | Madeleine van Zuylen | Oren Etzioni
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph consists of more than 280M nodes, representing papers, authors, entities and various interactions between them (e.g., authorships, citations, entity mentions). We reduce literature graph construction into familiar NLP tasks (e.g., entity extraction and linking), point out research challenges due to differences from standard formulations of these tasks, and report empirical results for each task. The methods described in this paper are used to enable semantic features in www.semanticscholar.org.

pdf bib
Ontology alignment in the biomedical domain using entity definitions and context
Lucy Wang | Chandra Bhagavatula | Mark Neumann | Kyle Lo | Chris Wilhelm | Waleed Ammar
Proceedings of the BioNLP 2018 workshop

Ontology alignment is the task of identifying semantically equivalent entities from two given ontologies. Different ontologies have different representations of the same entity, resulting in a need to de-duplicate entities when merging ontologies. We propose a method for enriching entities in an ontology with external definition and context information, and use this additional information for ontology alignment. We develop a neural architecture capable of encoding the additional information when available, and show that the addition of external data results in an F1-score of 0.69 on the Ontology Alignment Evaluation Initiative (OAEI) largebio SNOMED-NCI subtask, comparable with the entity-level matchers in a SOTA system.

2017

pdf bib
Semi-supervised sequence tagging with bidirectional language models
Matthew Peters | Waleed Ammar | Chandra Bhagavatula | Russell Power
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Pre-trained word embeddings learned from unlabeled text have become a standard component of neural network architectures for NLP tasks. However, in most cases, the recurrent network that operates on word-level representations to produce context sensitive representations is trained on relatively little labeled data. In this paper, we demonstrate a general semi-supervised approach for adding pretrained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks. We evaluate our model on two standard datasets for named entity recognition (NER) and chunking, and in both cases achieve state of the art results, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers.

pdf bib
The AI2 system at SemEval-2017 Task 10 (ScienceIE): semi-supervised end-to-end entity and relation extraction
Waleed Ammar | Matthew Peters | Chandra Bhagavatula | Russell Power
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes our submission for the ScienceIE shared task (SemEval- 2017 Task 10) on entity and relation extraction from scientific papers. Our model is based on the end-to-end relation extraction model of Miwa and Bansal (2016) with several enhancements such as semi-supervised learning via neural language models, character-level encoding, gazetteers extracted from existing knowledge bases, and model ensembles. Our official submission ranked first in end-to-end entity and relation extraction (scenario 1), and second in the relation-only extraction (scenario 3).

2015

pdf bib
Efficient Methods for Inferring Large Sparse Topic Hierarchies
Doug Downey | Chandra Bhagavatula | Yi Yang
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Adding High-Precision Links to Wikipedia
Thanapon Noraset | Chandra Bhagavatula | Doug Downey
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)