Allyson Ettinger


2020

pdf bib
Spying on Your Neighbors: Fine-grained Probing of Contextual Embeddings for Information about Surrounding Words
Josef Klafka | Allyson Ettinger
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Although models using contextual word embeddings have achieved state-of-the-art results on a host of NLP tasks, little is known about exactly what information these embeddings encode about the context words that they are understood to reflect. To address this question, we introduce a suite of probing tasks that enable fine-grained testing of contextual embeddings for encoding of information about surrounding words. We apply these tasks to examine the popular BERT, ELMo and GPT contextual encoders, and find that each of our tested information types is indeed encoded as contextual information across tokens, often with near-perfect recoverability—but the encoders vary in which features they distribute to which tokens, how nuanced their distributions are, and how robust the encoding of each feature is to distance. We discuss implications of these results for how different types of models break down and prioritize word-level context information when constructing token embeddings.

pdf bib
PeTra: A Sparsely Supervised Memory Model for People Tracking
Shubham Toshniwal | Allyson Ettinger | Kevin Gimpel | Karen Livescu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose PeTra, a memory-augmented neural network designed to track entities in its memory slots. PeTra is trained using sparse annotation from the GAP pronoun resolution dataset and outperforms a prior memory model on the task while using a simpler architecture. We empirically compare key modeling choices, finding that we can simplify several aspects of the design of the memory module while retaining strong performance. To measure the people tracking capability of memory models, we (a) propose a new diagnostic evaluation based on counting the number of unique entities in text, and (b) conduct a small scale human evaluation to compare evidence of people tracking in the memory logs of PeTra relative to a previous approach. PeTra is highly effective in both evaluations, demonstrating its ability to track people in its memory despite being trained with limited annotation.

pdf bib
Exploring BERT’s Sensitivity to Lexical Cues using Tests from Semantic Priming
Kanishka Misra | Allyson Ettinger | Julia Rayz
Findings of the Association for Computational Linguistics: EMNLP 2020

Models trained to estimate word probabilities in context have become ubiquitous in natural language processing. How do these models use lexical cues in context to inform their word probabilities? To answer this question, we present a case study analyzing the pre-trained BERT model with tests informed by semantic priming. Using English lexical stimuli that show priming in humans, we find that BERT too shows “priming”, predicting a word with greater probability when the context includes a related word versus an unrelated one. This effect decreases as the amount of information provided by the context increases. Follow-up analysis shows BERT to be increasingly distracted by related prime words as context becomes more informative, assigning lower probabilities to related words. Our findings highlight the importance of considering contextual constraint effects when studying word prediction in these models, and highlight possible parallels with human processing.

pdf bib
Assessing Phrasal Representation and Composition in Transformers
Lang Yu | Allyson Ettinger
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Deep transformer models have pushed performance on NLP tasks to new limits, suggesting sophisticated treatment of complex linguistic inputs, such as phrases. However, we have limited understanding of how these models handle representation of phrases, and whether this reflects sophisticated composition of phrase meaning like that done by humans. In this paper, we present systematic analysis of phrasal representations in state-of-the-art pre-trained transformers. We use tests leveraging human judgments of phrase similarity and meaning shift, and compare results before and after control of word overlap, to tease apart lexical effects versus composition effects. We find that phrase representation in these models relies heavily on word content, with little evidence of nuanced composition. We also identify variations in phrase representation quality across models, layers, and representation types, and make corresponding recommendations for usage of representations from these models.

pdf bib
Learning to Ignore: Long Document Coreference with Bounded Memory Neural Networks
Shubham Toshniwal | Sam Wiseman | Allyson Ettinger | Karen Livescu | Kevin Gimpel
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Long document coreference resolution remains a challenging task due to the large memory and runtime requirements of current models. Recent work doing incremental coreference resolution using just the global representation of entities shows practical benefits but requires keeping all entities in memory, which can be impractical for long documents. We argue that keeping all entities in memory is unnecessary, and we propose a memory-augmented neural network that tracks only a small bounded number of entities at a time, thus guaranteeing a linear runtime in length of document. We show that (a) the model remains competitive with models with high memory and computational requirements on OntoNotes and LitBank, and (b) the model learns an efficient memory management strategy easily outperforming a rule-based strategy

pdf bib
Proceedings of the Society for Computation in Linguistics 2020
Allyson Ettinger | Gaja Jarosz | Joe Pater
Proceedings of the Society for Computation in Linguistics 2020

pdf bib
What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models
Allyson Ettinger
Transactions of the Association for Computational Linguistics, Volume 8

Pre-training by language modeling has become a popular and successful approach to NLP tasks, but we have yet to understand exactly what linguistic capacities these pre-training processes confer upon models. In this paper we introduce a suite of diagnostics drawn from human language experiments, which allow us to ask targeted questions about information used by language models for generating predictions in context. As a case study, we apply these diagnostics to the popular BERT model, finding that it can generally distinguish good from bad completions involving shared category or role reversal, albeit with less sensitivity than humans, and it robustly retrieves noun hypernyms, but it struggles with challenging inference and role-based event prediction— and, in particular, it shows clear insensitivity to the contextual impacts of negation.

2018

pdf bib
Assessing Composition in Sentence Vector Representations
Allyson Ettinger | Ahmed Elgohary | Colin Phillips | Philip Resnik
Proceedings of the 27th International Conference on Computational Linguistics

An important component of achieving language understanding is mastering the composition of sentence meaning, but an immediate challenge to solving this problem is the opacity of sentence vector representations produced by current neural sentence composition models. We present a method to address this challenge, developing tasks that directly target compositional meaning information in sentence vector representations with a high degree of precision and control. To enable the creation of these controlled tasks, we introduce a specialized sentence generation system that produces large, annotated sentence sets meeting specified syntactic, semantic and lexical constraints. We describe the details of the method and generation system, and then present results of experiments applying our method to probe for compositional information in embeddings from a number of existing sentence composition models. We find that the method is able to extract useful information about the differing capacities of these models, and we discuss the implications of our results with respect to these systems’ capturing of sentence information. We make available for public use the datasets used for these experiments, as well as the generation system.

2017

pdf bib
Proceedings of ACL 2017, Student Research Workshop
Allyson Ettinger | Spandana Gella | Matthieu Labeau | Cecilia Ovesdotter Alm | Marine Carpuat | Mark Dredze
Proceedings of ACL 2017, Student Research Workshop

pdf bib
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems
Emily Bender | Hal Daumé III | Allyson Ettinger | Sudha Rao
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

pdf bib
Towards Linguistically Generalizable NLP Systems: A Workshop and Shared Task
Allyson Ettinger | Sudha Rao | Hal Daumé III | Emily M. Bender
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

This paper presents a summary of the first Workshop on Building Linguistically Generalizable Natural Language Processing Systems, and the associated Build It Break It, The Language Edition shared task. The goal of this workshop was to bring together researchers in NLP and linguistics with a carefully designed shared task aimed at testing the generalizability of NLP systems beyond the distributions of their training data. We describe the motivation, setup, and participation of the shared task, provide discussion of some highlighted results, and discuss lessons learned.

2016

pdf bib
Retrofitting Sense-Specific Word Vectors Using Parallel Text
Allyson Ettinger | Philip Resnik | Marine Carpuat
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Evaluating vector space models using human semantic priming results
Allyson Ettinger | Tal Linzen
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP

pdf bib
Probing for semantic evidence of composition by means of simple classification tasks
Allyson Ettinger | Ahmed Elgohary | Philip Resnik
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP

2015

pdf bib
Dialogue focus tracking for zero pronoun resolution
Sudha Rao | Allyson Ettinger | Hal Daumé III | Philip Resnik
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies