Aakanksha Naik


pdf bib
Towards Open Domain Event Trigger Identification using Adversarial Domain Adaptation
Aakanksha Naik | Carolyn Rose
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We tackle the task of building supervised event trigger identification models which can generalize better across domains. Our work leverages the adversarial domain adaptation (ADA) framework to introduce domain-invariance. ADA uses adversarial training to construct representations that are predictive for trigger identification, but not predictive of the example’s domain. It requires no labeled data from the target domain, making it completely unsupervised. Experiments with two domains (English literature and news) show that ADA leads to an average F1 score improvement of 3.9 on out-of-domain data. Our best performing model (BERT-A) reaches 44-49 F1 across both domains, using no labeled target data. Preliminary experiments reveal that finetuning on 1% labeled data, followed by self-training leads to substantial improvement, reaching 51.5 and 67.2 F1 on literature and news respectively.


pdf bib
EQUATE: A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference
Abhilasha Ravichander | Aakanksha Naik | Carolyn Rose | Eduard Hovy
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Quantitative reasoning is a higher-order reasoning skill that any intelligent natural language understanding system can reasonably be expected to handle. We present EQUATE (Evaluating Quantitative Understanding Aptitude in Textual Entailment), a new framework for quantitative reasoning in textual entailment. We benchmark the performance of 9 published NLI models on EQUATE, and find that on average, state-of-the-art methods do not achieve an absolute improvement over a majority-class baseline, suggesting that they do not implicitly learn to reason with quantities. We establish a new baseline Q-REAS that manipulates quantities symbolically. In comparison to the best performing NLI model, it achieves success on numerical reasoning tests (+24.2 %), but has limited verbal reasoning capabilities (-8.1 %). We hope our evaluation framework will support the development of models of quantitative reasoning in language understanding.

pdf bib
Exploring Numeracy in Word Embeddings
Aakanksha Naik | Abhilasha Ravichander | Carolyn Rose | Eduard Hovy
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Word embeddings are now pervasive across NLP subfields as the de-facto method of forming text representataions. In this work, we show that existing embedding models are inadequate at constructing representations that capture salient aspects of mathematical meaning for numbers, which is important for language understanding. Numbers are ubiquitous and frequently appear in text. Inspired by cognitive studies on how humans perceive numbers, we develop an analysis framework to test how well word embeddings capture two essential properties of numbers: magnitude (e.g. 3<4) and numeration (e.g. 3=three). Our experiments reveal that most models capture an approximate notion of magnitude, but are inadequate at capturing numeration. We hope that our observations provide a starting point for the development of methods which better capture numeracy in NLP systems.

pdf bib
Using Functional Schemas to Understand Social Media Narratives
Xinru Yan | Aakanksha Naik | Yohan Jo | Carolyn Rose
Proceedings of the Second Workshop on Storytelling

We propose a novel take on understanding narratives in social media, focusing on learning ”functional story schemas”, which consist of sets of stereotypical functional structures. We develop an unsupervised pipeline to extract schemas and apply our method to Reddit posts to detect schematic structures that are characteristic of different subreddits. We validate our schemas through human interpretation and evaluate their utility via a text classification task. Our experiments show that extracted schemas capture distinctive structural patterns in different subreddits, improving classification performance of several models by 2.4% on average. We also observe that these schemas serve as lenses that reveal community norms.

pdf bib
TDDiscourse: A Dataset for Discourse-Level Temporal Ordering of Events
Aakanksha Naik | Luke Breitfeller | Carolyn Rose
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Prior work on temporal relation classification has focused extensively on event pairs in the same or adjacent sentences (local), paying scant attention to discourse-level (global) pairs. This restricts the ability of systems to learn temporal links between global pairs, since reliance on local syntactic features suffices to achieve reasonable performance on existing datasets. However, systems should be capable of incorporating cues from document-level structure to assign temporal relations. In this work, we take a first step towards discourse-level temporal ordering by creating TDDiscourse, the first dataset focusing specifically on temporal links between event pairs which are more than one sentence apart. We create TDDiscourse by augmenting TimeBank-Dense, a corpus of English news articles, manually annotating global pairs that cannot be inferred automatically from existing annotations. Our annotations double the number of temporal links in TimeBank-Dense, while possessing several desirable properties such as focusing on long-distance pairs and not being automatically inferable. We adapt and benchmark the performance of three state-of-the-art models on TDDiscourse and observe that existing systems indeed find discourse-level temporal ordering harder.


pdf bib
Stress Test Evaluation for Natural Language Inference
Aakanksha Naik | Abhilasha Ravichander | Norman Sadeh | Carolyn Rose | Graham Neubig
Proceedings of the 27th International Conference on Computational Linguistics

Natural language inference (NLI) is the task of determining if a natural language hypothesis can be inferred from a given premise in a justifiable manner. NLI was proposed as a benchmark task for natural language understanding. Existing models perform well at standard datasets for NLI, achieving impressive results across different genres of text. However, the extent to which these models understand the semantic content of sentences is unclear. In this work, we propose an evaluation methodology consisting of automatically constructed “stress tests” that allow us to examine whether systems have the ability to make real inferential decisions. Our evaluation of six sentence-encoder models on these stress tests reveals strengths and weaknesses of these models with respect to challenging linguistic phenomena, and suggests important directions for future work in this area.


pdf bib
Tackling Biomedical Text Summarization: OAQA at BioASQ 5B
Khyathi Chandu | Aakanksha Naik | Aditya Chandrasekar | Zi Yang | Niloy Gupta | Eric Nyberg
BioNLP 2017

In this paper, we describe our participation in phase B of task 5b of the fifth edition of the annual BioASQ challenge, which includes answering factoid, list, yes-no and summary questions from biomedical data. We describe our techniques with an emphasis on ideal answer generation, where the goal is to produce a relevant, precise, non-redundant, query-oriented summary from multiple relevant documents. We make use of extractive summarization techniques to address this task and experiment with different biomedical ontologies and various algorithms including agglomerative clustering, Maximum Marginal Relevance (MMR) and sentence compression. We propose a novel word embedding based tf-idf similarity metric and a soft positional constraint which improve our system performance. We evaluate our techniques on test batch 4 from the fourth edition of the challenge. Our best system achieves a ROUGE-2 score of 0.6534 and ROUGE-SU4 score of 0.6536.

pdf bib
Extracting Personal Medical Events for User Timeline Construction using Minimal Supervision
Aakanksha Naik | Chris Bogart | Carolyn Rose
BioNLP 2017

In this paper, we describe a system for automatic construction of user disease progression timelines from their posts in online support groups using minimal supervision. In recent years, several online support groups have been established which has led to a huge increase in the amount of patient-authored text available. Creating systems which can automatically extract important medical events and create disease progression timelines for users from such text can help in patient health monitoring as well as studying links between medical events and users’ participation in support groups. Prior work in this domain has used manually constructed keyword sets to detect medical events. In this work, our aim is to perform medical event detection using minimal supervision in order to develop a more general timeline construction system. Our system achieves an accuracy of 55.17%, which is 92% of the performance achieved by a supervised baseline system.