Kellie Webster


2020

pdf bib
Social Biases in NLP Models as Barriers for Persons with Disabilities
Ben Hutchinson | Vinodkumar Prabhakaran | Emily Denton | Kellie Webster | Yu Zhong | Stephen Denuyl
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Building equitable and inclusive NLP technologies demands consideration of whether and how social attitudes are represented in ML models. In particular, representations encoded in models often inadvertently perpetuate undesirable social biases from the data on which they are trained. In this paper, we present evidence of such undesirable biases towards mentions of disability in two different English language models: toxicity prediction and sentiment analysis. Next, we demonstrate that the neural embeddings that are the critical first step in most NLP pipelines similarly contain undesirable biases towards mentions of disability. We end by highlighting topical biases in the discourse about disability which may contribute to the observed model biases; for instance, gun violence, homelessness, and drug addiction are over-represented in texts discussing mental illness.

pdf bib
Automatically Identifying Gender Issues in Machine Translation using Perturbations
Hila Gonen | Kellie Webster
Findings of the Association for Computational Linguistics: EMNLP 2020

The successful application of neural methods to machine translation has realized huge quality advances for the community. With these improvements, many have noted outstanding challenges, including the modeling and treatment of gendered language. While previous studies have identified issues using synthetic examples, we develop a novel technique to mine examples from real world data to explore challenges for deployed systems. We use our method to compile an evaluation benchmark spanning examples for four languages from three language families, which we publicly release to facilitate research. The examples in our benchmark expose where model representations are gendered, and the unintended consequences these gendered representations can have in downstream application.

pdf bib
Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias
Ana Valeria González | Maria Barrett | Rasmus Hvingelby | Kellie Webster | Anders Søgaard
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The one-sided focus on English in previous studies of gender bias in NLP misses out on opportunities in other languages: English challenge datasets such as GAP and WinoGender highlight model preferences that are “hallucinatory”, e.g., disambiguating gender-ambiguous occurrences of ‘doctor’ as male doctors. We show that for languages with type B reflexivization, e.g., Swedish and Russian, we can construct multi-task challenge datasets for detecting gender bias that lead to unambiguously wrong model predictions: In these languages, the direct translation of ‘the doctor removed his mask’ is not ambiguous between a coreferential reading and a disjoint reading. Instead, the coreferential reading requires a non-gendered pronoun, and the gendered, possessive pronouns are anti-reflexive. We present a multilingual, multi-task challenge dataset, which spans four languages and four NLP tasks and focuses only on this phenomenon. We find evidence for gender bias across all task-language combinations and correlate model bias with national labor market statistics.

2019

pdf bib
Proceedings of the First Workshop on Gender Bias in Natural Language Processing
Marta R. Costa-jussà | Christian Hardmeier | Will Radford | Kellie Webster
Proceedings of the First Workshop on Gender Bias in Natural Language Processing

pdf bib
Gendered Ambiguous Pronoun (GAP) Shared Task at the Gender Bias in NLP Workshop 2019
Kellie Webster | Marta R. Costa-jussà | Christian Hardmeier | Will Radford
Proceedings of the First Workshop on Gender Bias in Natural Language Processing

The 1st ACL workshop on Gender Bias in Natural Language Processing included a shared task on gendered ambiguous pronoun (GAP) resolution. This task was based on the coreference challenge defined in Webster et al. (2018), designed to benchmark the ability of systems to resolve pronouns in real-world contexts in a gender-fair way. 263 teams competed via a Kaggle competition, with the winning system achieving logloss of 0.13667 and near gender parity. We review the approaches of eleven systems with accepted description papers, noting their effective use of BERT (Devlin et al., 2018), both via fine-tuning and for feature extraction, as well as ensembling.

2018

pdf bib
A Challenge Set and Methods for Noun-Verb Ambiguity
Ali Elkahky | Kellie Webster | Daniel Andor | Emily Pitler
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

English part-of-speech taggers regularly make egregious errors related to noun-verb ambiguity, despite having achieved 97%+ accuracy on the WSJ Penn Treebank since 2002. These mistakes have been difficult to quantify and make taggers less useful to downstream tasks such as translation and text-to-speech synthesis. This paper creates a new dataset of over 30,000 naturally-occurring non-trivial examples of noun-verb ambiguity. Taggers within 1% of each other when measured on the WSJ have accuracies ranging from 57% to 75% accuracy on this challenge set. Enhancing the strongest existing tagger with contextual word embeddings and targeted training data improves its accuracy to 89%, a 14% absolute (52% relative) improvement. Downstream, using just this enhanced tagger yields a 28% reduction in error over the prior best learned model for homograph disambiguation for textto-speech synthesis.

pdf bib
Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns
Kellie Webster | Marta Recasens | Vera Axelrod | Jason Baldridge
Transactions of the Association for Computational Linguistics, Volume 6

Coreference resolution is an important task for natural language understanding, and the resolution of ambiguous pronouns a longstanding challenge. Nonetheless, existing corpora do not capture ambiguous pronouns in sufficient volume or diversity to accurately indicate the practical utility of models. Furthermore, we find gender bias in existing corpora and systems favoring masculine entities. To address this, we present and release GAP, a gender-balanced labeled corpus of 8,908 ambiguous pronoun–name pairs sampled to provide diverse coverage of challenges posed by real-world text. We explore a range of baselines that demonstrate the complexity of the challenge, the best achieving just 66.9% F1. We show that syntactic structure and continuous neural models provide promising, complementary cues for approaching the challenge.

2016

pdf bib
Using mention accessibility to improve coreference resolution
Kellie Webster | Joel Nothman
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf bib
Proceedings of the Australasian Language Technology Association Workshop 2015
Ben Hachey | Kellie Webster
Proceedings of the Australasian Language Technology Association Workshop 2015

2014

pdf bib
Limited memory incremental coreference resolution
Kellie Webster | James R. Curran
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Examining the Impact of Coreference Resolution on Quote Attribution
Tim O’Keefe | Kellie Webster | James R. Curran | Irena Koprinska
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)