Niranjan Balasubramanian


2020

pdf bib
Generating Narrative Text in a Switching Dynamical System
Noah Weber | Leena Shekhar | Heeyoung Kwon | Niranjan Balasubramanian | Nathanael Chambers
Proceedings of the 24th Conference on Computational Natural Language Learning

Early work on narrative modeling used explicit plans and goals to generate stories, but the language generation itself was restricted and inflexible. Modern methods use language models for more robust generation, but often lack an explicit representation of the scaffolding and dynamics that guide a coherent narrative. This paper introduces a new model that integrates explicit narrative structure with neural language models, formalizing narrative modeling as a Switching Linear Dynamical System (SLDS). A SLDS is a dynamical system in which the latent dynamics of the system (i.e. how the state vector transforms over time) is controlled by top-level discrete switching variables. The switching variables represent narrative structure (e.g., sentiment or discourse states), while the latent state vector encodes information on the current state of the narrative. This probabilistic formulation allows us to control generation, and can be learned in a semi-supervised fashion using both labeled and unlabeled data. Additionally, we derive a Gibbs sampler for our model that can “fill in” arbitrary parts of the narrative, guided by the switching variables. Our filled-in (English language) narratives outperform several baselines on both automatic and human evaluations

pdf bib
DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Qingqing Cao | Harsh Trivedi | Aruna Balasubramanian | Niranjan Balasubramanian
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Transformer-based QA models use input-wide self-attention – i.e. across both the question and the input passage – at all layers, causing them to be slow and memory-intensive. It turns out that we can get by without input-wide self-attention at all layers, especially in the lower layers. We introduce DeFormer, a decomposed transformer, which substitutes the full self-attention with question-wide and passage-wide self-attentions in the lower layers. This allows for question-independent processing of the input text representations, which in turn enables pre-computing passage representations reducing runtime compute drastically. Furthermore, because DeFormer is largely similar to the original model, we can initialize DeFormer with the pre-training weights of a standard transformer, and directly fine-tune on the target QA dataset. We show DeFormer versions of BERT and XLNet can be used to speed up QA by over 4.3x and with simple distillation-based losses they incur only a 1% drop in accuracy. We open source the code at https://github.com/StonyBrookNLP/deformer.

pdf bib
Modeling Label Semantics for Predicting Emotional Reactions
Radhika Gaonkar | Heeyoung Kwon | Mohaddeseh Bastan | Niranjan Balasubramanian | Nathanael Chambers
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Predicting how events induce emotions in the characters of a story is typically seen as a standard multi-label classification task, which usually treats labels as anonymous classes to predict. They ignore information that may be conveyed by the emotion labels themselves. We propose that the semantics of emotion labels can guide a model’s attention when representing the input story. Further, we observe that the emotions evoked by an event are often related: an event that evokes joy is unlikely to also evoke sadness. In this work, we explicitly model label classes via label embeddings, and add mechanisms that track label-label correlations both during training and inference. We also introduce a new semi-supervision strategy that regularizes for the correlations on unlabeled data. Our empirical evaluations show that modeling label semantics yields consistent benefits, and we advance the state-of-the-art on an emotion inference task.

pdf bib
Hierarchical Modeling for User Personality Prediction: The Role of Message-Level Attention
Veronica Lynn | Niranjan Balasubramanian | H. Andrew Schwartz
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Not all documents are equally important. Language processing is increasingly finding use as a supplement for questionnaires to assess psychological attributes of consenting individuals, but most approaches neglect to consider whether all documents of an individual are equally informative. In this paper, we present a novel model that uses message-level attention to learn the relative weight of users’ social media posts for assessing their five factor personality traits. We demonstrate that models with message-level attention outperform those with word-level attention, and ultimately yield state-of-the-art accuracies for all five traits by using both word and message attention in combination with past approaches (an average increase in Pearson r of 2.5%). In addition, examination of the high-signal posts identified by our model provides insight into the relationship between language and personality, helping to inform future work.

pdf bib
Modeling Preconditions in Text with a Crowd-sourced Dataset
Heeyoung Kwon | Mahnaz Koupaee | Pratyush Singh | Gargi Sawhney | Anmol Shukla | Keerthi Kumar Kallur | Nathanael Chambers | Niranjan Balasubramanian
Findings of the Association for Computational Linguistics: EMNLP 2020

Preconditions provide a form of logical connection between events that explains why some events occur together and information that is complementary to the more widely studied relations such as causation, temporal ordering, entailment, and discourse relations. Modeling preconditions in text has been hampered in part due to the lack of large scale labeled data grounded in text. This paper introduces PeKo, a crowd-sourced annotation of preconditions between event pairs in newswire, an order of magnitude larger than prior text annotations. To complement this new corpus, we also introduce two challenge tasks aimed at modeling preconditions: (i) Precondition Identification – a standard classification task defined over pairs of event mentions, and (ii) Precondition Generation – a generative task aimed at testing a more general ability to reason about a given event. Evaluation on both tasks shows that modeling preconditions is challenging even for today’s large language models (LM). This suggests that precondition knowledge is not easily accessible in LM-derived representations alone. Our generation results show that fine-tuning an LM on PeKo yields better conditional relations than when trained on raw text or temporally-ordered corpora.

pdf bib
Towards Accurate and Reliable Energy Measurement of NLP Models
Qingqing Cao | Aruna Balasubramanian | Niranjan Balasubramanian
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing

Accurate and reliable measurement of energy consumption is critical for making well-informed design choices when choosing and training large scale NLP models. In this work, we show that existing software-based energy estimations are not accurate because they do not take into account hardware differences and how resource utilization affects energy consumption. We conduct energy measurement experiments with four different models for a question answering task. We quantify the error of existing software-based energy estimations by using a hardware power meter that provides highly accurate energy measurements. Our key takeaway is the need for a more accurate energy estimation model that takes into account hardware variabilities and the non-linear relationship between resource utilization and energy consumption. We release the code and data at https://github.com/csarron/sustainlp2020-energy.

pdf bib
Author’s Sentiment Prediction
Mohaddeseh Bastan | Mahnaz Koupaee | Youngseo Son | Richard Sicoli | Niranjan Balasubramanian
Proceedings of the 28th International Conference on Computational Linguistics

Even though sentiment analysis has been well-studied on a wide range of domains, there hasn’tbeen much work on inferring author sentiment in news articles. To address this gap, we introducePerSenT, a crowd-sourced dataset that captures the sentiment of an author towards the mainentity in a news article. Our benchmarks of multiple strong baselines show that this is a difficultclassification task. BERT performs the best amongst the baselines. However, it only achievesa modest performance overall suggesting that fine-tuning document-level representations aloneisn’t adequate for this task. Making paragraph-level decisions and aggregating over the entiredocument is also ineffective. We present empirical and qualitative analyses that illustrate thespecific challenges posed by this dataset. We release this dataset with 5.3k documents and 38kparagraphs with 3.2k unique entities as a challenge in entity sentiment analysis.

pdf bib
Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected Reasoning
Harsh Trivedi | Niranjan Balasubramanian | Tushar Khot | Ashish Sabharwal
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Has there been real progress in multi-hop question-answering? Models often exploit dataset artifacts to produce correct answers, without connecting information across multiple supporting facts. This limits our ability to measure true progress and defeats the purpose of building multi-hop QA datasets. We make three contributions towards addressing this. First, we formalize such undesirable behavior as disconnected reasoning across subsets of supporting facts. This allows developing a model-agnostic probe for measuring how much any model can cheat via disconnected reasoning. Second, using a notion of contrastive support sufficiency, we introduce an automatic transformation of existing datasets that reduces the amount of disconnected reasoning. Third, our experiments suggest that there hasn’t been much progress in multi-hop QA in the reading comprehension setting. For a recent large-scale model (XLNet), we show that only 18 points out of its answer F1 score of 72 on HotpotQA are obtained through multifact reasoning, roughly the same as that of a simpler RNN baseline. Our transformation substantially reduces disconnected reasoning (19 points in answer F1). It is complementary to adversarial approaches, yielding further reductions in conjunction.

2019

pdf bib
Latent Part-of-Speech Sequences for Neural Machine Translation
Xuewen Yang | Yingru Liu | Dongliang Xie | Xin Wang | Niranjan Balasubramanian
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Learning target side syntactic structure has been shown to improve Neural Machine Translation (NMT). However, incorporating syntax through latent variables introduces additional complexity in inference, as the models need to marginalize over the latent syntactic structures. To avoid this, models often resort to greedy search which only allows them to explore a limited portion of the latent space. In this work, we introduce a new latent variable model, LaSyn, that captures the co-dependence between syntax and semantics, while allowing for effective and efficient inference over the latent space. LaSyn decouples direct dependence between successive latent variables, which allows its decoder to exhaustively search through the latent syntactic choices, while keeping decoding speed proportional to the size of the latent variable vocabulary. We implement LaSyn by modifying a transformer-based NMT system and design a neural expectation maximization algorithm that we regularize with part-of-speech information as the latent sequences. Evaluations on four different MT tasks show that incorporating target side syntax with LaSyn improves both translation quality, and also provides an opportunity to improve diversity.

pdf bib
Tweet Classification without the Tweet: An Empirical Examination of User versus Document Attributes
Veronica Lynn | Salvatore Giorgi | Niranjan Balasubramanian | H. Andrew Schwartz
Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science

NLP naturally puts a primary focus on leveraging document language, occasionally considering user attributes as supplemental. However, as we tackle more social scientific tasks, it is possible user attributes might be of primary importance and the document supplemental. Here, we systematically investigate the predictive power of user-level features alone versus document-level features for document-level tasks. We first show user attributes can sometimes carry more task-related information than the document itself. For example, a tweet-level stance detection model using only 13 user-level attributes (i.e. features that did not depend on the specific tweet) was able to obtain a higher F1 than the top-performing SemEval participant. We then consider multiple tasks and a wider range of user attributes, showing the performance of strong document-only models can often be improved (as in stance, sentiment, and sarcasm) with user attributes, particularly benefiting tasks with stable “trait-like” outcomes (e.g. stance) most relative to frequently changing “state-like” outcomes (e.g. sentiment). These results not only support the growing work on integrating user factors into predictive systems, but that some of our NLP tasks might be better cast primarily as user-level (or human) tasks.

pdf bib
PoMo: Generating Entity-Specific Post-Modifiers in Context
Jun Seok Kang | Robert Logan | Zewei Chu | Yang Chen | Dheeru Dua | Kevin Gimpel | Sameer Singh | Niranjan Balasubramanian
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We introduce entity post-modifier generation as an instance of a collaborative writing task. Given a sentence about a target entity, the task is to automatically generate a post-modifier phrase that provides contextually relevant information about the entity. For example, for the sentence, “Barack Obama, _______, supported the #MeToo movement.”, the phrase “a father of two girls” is a contextually relevant post-modifier. To this end, we build PoMo, a post-modifier dataset created automatically from news articles reflecting a journalistic need for incorporating entity information that is relevant to a particular news event. PoMo consists of more than 231K sentences with post-modifiers and associated facts extracted from Wikidata for around 57K unique entities. We use crowdsourcing to show that modeling contextual relevance is necessary for accurate post-modifier generation. We adapt a number of existing generation approaches as baselines for this dataset. Our results show there is large room for improvement in terms of both identifying relevant facts to include (knowing which claims are relevant gives a >20% improvement in BLEU score), and generating appropriate post-modifier text for the context (providing relevant claims is not sufficient for accurate generation). We conduct an error analysis that suggests promising directions for future research.

pdf bib
Repurposing Entailment for Multi-Hop Question Answering Tasks
Harsh Trivedi | Heeyoung Kwon | Tushar Khot | Ashish Sabharwal | Niranjan Balasubramanian
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Question Answering (QA) naturally reduces to an entailment problem, namely, verifying whether some text entails the answer to a question. However, for multi-hop QA tasks, which require reasoning with multiple sentences, it remains unclear how best to utilize entailment models pre-trained on large scale datasets such as SNLI, which are based on sentence pairs. We introduce Multee, a general architecture that can effectively use entailment models for multi-hop QA tasks. Multee uses (i) a local module that helps locate important sentences, thereby avoiding distracting information, and (ii) a global module that aggregates information by effectively incorporating importance weights. Importantly, we show that both modules can use entailment functions pre-trained on a large scale NLI datasets. We evaluate performance on MultiRC and OpenBookQA, two multihop QA datasets. When using an entailment function pre-trained on NLI datasets, Multee outperforms QA models trained only on the target QA datasets and the OpenAI transformer models.

2018

pdf bib
Residualized Factor Adaptation for Community Social Media Prediction Tasks
Mohammadzaman Zamani | H. Andrew Schwartz | Veronica Lynn | Salvatore Giorgi | Niranjan Balasubramanian
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Predictive models over social media language have shown promise in capturing community outcomes, but approaches thus far largely neglect the socio-demographic context (e.g. age, education rates, race) of the community from which the language originates. For example, it may be inaccurate to assume people in Mobile, Alabama, where the population is relatively older, will use words the same way as those from San Francisco, where the median age is younger with a higher rate of college education. In this paper, we present residualized factor adaptation, a novel approach to community prediction tasks which both (a) effectively integrates community attributes, as well as (b) adapts linguistic features to community attributes (factors). We use eleven demographic and socioeconomic attributes, and evaluate our approach over five different community-level predictive tasks, spanning health (heart disease mortality, percent fair/poor health), psychology (life satisfaction), and economics (percent housing price increase, foreclosure rate). Our evaluation shows that residualized factor adaptation significantly improves 4 out of 5 community-level outcome predictions over prior state-of-the-art for incorporating socio-demographic contexts.

pdf bib
Hierarchical Quantized Representations for Script Generation
Noah Weber | Leena Shekhar | Niranjan Balasubramanian | Nathanael Chambers
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Scripts define knowledge about how everyday scenarios (such as going to a restaurant) are expected to unfold. One of the challenges to learning scripts is the hierarchical nature of the knowledge. For example, a suspect arrested might plead innocent or guilty, and a very different track of events is then expected to happen. To capture this type of information, we propose an autoencoder model with a latent space defined by a hierarchy of categorical variables. We utilize a recently proposed vector quantization based approach, which allows continuous embeddings to be associated with each latent variable value. This permits the decoder to softly decide what portions of the latent hierarchy to condition on by attending over the value embeddings for a given setting. Our model effectively encodes and generates scripts, outperforming a recent language modeling-based method on several standard tasks, and allowing the autoencoder model to achieve substantially lower perplexity scores compared to the previous language modeling-based method.

pdf bib
The Fine Line between Linguistic Generalization and Failure in Seq2Seq-Attention Models
Noah Weber | Leena Shekhar | Niranjan Balasubramanian
Proceedings of the Workshop on Generalization in the Age of Deep Learning

Seq2Seq based neural architectures have become the go-to architecture to apply to sequence to sequence language tasks. Despite their excellent performance on these tasks, recent work has noted that these models typically do not fully capture the linguistic structure required to generalize beyond the dense sections of the data distribution (Ettinger et al., 2017), and as such, are likely to fail on examples from the tail end of the distribution (such as inputs that are noisy (Belinkov and Bisk, 2018), or of different length (Bentivogli et al., 2016)). In this paper we look at a model’s ability to generalize on a simple symbol rewriting task with a clearly defined structure. We find that the model’s ability to generalize this structure beyond the training distribution depends greatly on the chosen random seed, even when performance on the test set remains the same. This finding suggests that model’s ability to capture generalizable structure is highly sensitive, and more so, this sensitivity may not be apparent when evaluating the model on standard test sets.

2017

pdf bib
Human Centered NLP with User-Factor Adaptation
Veronica Lynn | Youngseo Son | Vivek Kulkarni | Niranjan Balasubramanian | H. Andrew Schwartz
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We pose the general task of user-factor adaptation – adapting supervised learning models to real-valued user factors inferred from a background of their language, reflecting the idea that a piece of text should be understood within the context of the user that wrote it. We introduce a continuous adaptation technique, suited for real-valued user factors that are common in social science and bringing us closer to personalized NLP, adapting to each user uniquely. We apply this technique with known user factors including age, gender, and personality traits, as well as latent factors, evaluating over five tasks: POS tagging, PP-attachment, sentiment analysis, sarcasm detection, and stance detection. Adaptation provides statistically significant benefits for 3 of the 5 tasks: up to +1.2 points for PP-attachment, +3.4 points for sarcasm, and +3.0 points for stance.

2016

pdf bib
Cross Sentence Inference for Process Knowledge
Samuel Louvan | Chetan Naik | Sadhana Kumaravel | Heeyoung Kwon | Niranjan Balasubramanian | Peter Clark
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
What’s in an Explanation? Characterizing Knowledge and Inference Requirements for Elementary Science Exams
Peter Jansen | Niranjan Balasubramanian | Mihai Surdeanu | Peter Clark
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

QA systems have been making steady advances in the challenging elementary science exam domain. In this work, we develop an explanation-based analysis of knowledge and inference requirements, which supports a fine-grained characterization of the challenges. In particular, we model the requirements based on appropriate sources of evidence to be used for the QA task. We create requirements by first identifying suitable sentences in a knowledge base that support the correct answer, then use these to build explanations, filling in any necessary missing information. These explanations are used to create a fine-grained categorization of the requirements. Using these requirements, we compare a retrieval and an inference solver on 212 questions. The analysis validates the gains of the inference solver, demonstrating that it answers more questions requiring complex inference, while also providing insights into the relative strengths of the solvers and knowledge sources. We release the annotated questions and explanations as a resource with broad utility for science exam QA, including determining knowledge base construction targets, as well as supporting information aggregation in automated inference.

2015

pdf bib
Exploring Markov Logic Networks for Question Answering
Tushar Khot | Niranjan Balasubramanian | Eric Gribkoff | Ashish Sabharwal | Peter Clark | Oren Etzioni
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2013

pdf bib
Generating Coherent Event Schemas at Scale
Niranjan Balasubramanian | Stephen Soderland | Mausam | Oren Etzioni
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2012

pdf bib
Constructing a Textual KB from a Biology TextBook
Peter Clark | Phil Harrison | Niranjan Balasubramanian | Oren Etzioni
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

pdf bib
Rel-grams: A Probabilistic Model of Relations in Text
Niranjan Balasubramanian | Stephen Soderland | Mausam | Oren Etzioni
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)