Maria Barrett


pdf bib
The Sensitivity of Language Models and Humans to Winograd Schema Perturbations
Mostafa Abdou | Vinit Ravishankar | Maria Barrett | Yonatan Belinkov | Desmond Elliott | Anders Søgaard
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Large-scale pretrained language models are the major driving force behind recent improvements in perfromance on the Winograd Schema Challenge, a widely employed test of commonsense reasoning ability. We show, however, with a new diagnostic dataset, that these models are sensitive to linguistic perturbations of the Winograd examples that minimally affect human understanding. Our results highlight interesting differences between humans and language models: language models are more sensitive to number or gender alternations and synonym replacements than humans, and humans are more stable and consistent in their predictions, maintain a much higher absolute performance, and perform better on non-associative instances than associative ones.

pdf bib
DaNE: A Named Entity Resource for Danish
Rasmus Hvingelby | Amalie Brogaard Pauli | Maria Barrett | Christina Rosted | Lasse Malm Lidegaard | Anders Søgaard
Proceedings of the 12th Language Resources and Evaluation Conference

We present a named entity annotation for the Danish Universal Dependencies treebank using the CoNLL-2003 annotation scheme: DaNE. It is the largest publicly available, Danish named entity gold annotation. We evaluate the quality of our annotations intrinsically by double annotating the entire treebank and extrinsically by comparing our annotations to a recently released named entity annotation of the validation and test sections of the Danish Universal Dependencies treebank. We benchmark the new resource by training and evaluating competitive architectures for supervised named entity recognition (NER), including FLAIR, monolingual (Danish) BERT and multilingual BERT. We explore cross-lingual transfer in multilingual BERT from five related languages in zero-shot and direct transfer setups, and we show that even with our modestly-sized training set, we improve Danish NER over a recent cross-lingual approach, as well as over zero-shot transfer from five related languages. Using multilingual BERT, we achieve higher performance by fine-tuning on both DaNE and a larger Bokmål (Norwegian) training set compared to only using DaNE. However, the highest performance isachieved by using a Danish BERT fine-tuned on DaNE. Our dataset enables improvements and applicability for Danish NER beyond cross-lingual methods. We employ a thorough error analysis of the predictions of the best models for seen and unseen entities, as well as their robustness on un-capitalized text. The annotated dataset and all the trained models are made publicly available.

pdf bib
Towards Best Practices for Leveraging Human Language Processing Signals for Natural Language Processing
Nora Hollenstein | Maria Barrett | Lisa Beinborn
Proceedings of the Second Workshop on Linguistic and Neurocognitive Resources

NLP models are imperfect and lack intricate capabilities that humans access automatically when processing speech or reading a text. Human language processing data can be leveraged to increase the performance of models and to pursue explanatory research for a better understanding of the differences between human and machine language processing. We review recent studies leveraging different types of cognitive processing signals, namely eye-tracking, M/EEG and fMRI data recorded during language understanding. We discuss the role of cognitive data for machine learning-based NLP methods and identify fundamental challenges for processing pipelines. Finally, we propose practical strategies for using these types of cognitive signals to enhance NLP models.

pdf bib
Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias
Ana Valeria González | Maria Barrett | Rasmus Hvingelby | Kellie Webster | Anders Søgaard
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The one-sided focus on English in previous studies of gender bias in NLP misses out on opportunities in other languages: English challenge datasets such as GAP and WinoGender highlight model preferences that are “hallucinatory”, e.g., disambiguating gender-ambiguous occurrences of ‘doctor’ as male doctors. We show that for languages with type B reflexivization, e.g., Swedish and Russian, we can construct multi-task challenge datasets for detecting gender bias that lead to unambiguously wrong model predictions: In these languages, the direct translation of ‘the doctor removed his mask’ is not ambiguous between a coreferential reading and a disjoint reading. Instead, the coreferential reading requires a non-gendered pronoun, and the gendered, possessive pronouns are anti-reflexive. We present a multilingual, multi-task challenge dataset, which spans four languages and four NLP tasks and focuses only on this phenomenon. We find evidence for gender bias across all task-language combinations and correlate model bias with national labor market statistics.


pdf bib
Adversarial Removal of Demographic Attributes Revisited
Maria Barrett | Yova Kementchedjhieva | Yanai Elazar | Desmond Elliott | Anders Søgaard
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a held-out subsample of the data it was trained on. We revisit their experiments and conduct a series of follow-up experiments showing that, in fact, the diagnostic classifier generalizes poorly to both new in-domain samples and new domains, indicating that it relies on correlations specific to their particular data sample. We further show that a diagnostic classifier trained on the biased baseline neural network also does not generalize to new samples. In other words, the biases detected in Elazar and Goldberg (2018) seem restricted to their particular data sample, and would therefore not bias the decisions of the model on new samples, whether in-domain or out-of-domain. In light of this, we discuss better methodologies for detecting bias in our models.


pdf bib
Sequence Classification with Human Attention
Maria Barrett | Joachim Bingel | Nora Hollenstein | Marek Rei | Anders Søgaard
Proceedings of the 22nd Conference on Computational Natural Language Learning

Learning attention functions requires large volumes of data, but many NLP tasks simulate human behavior, and in this paper, we show that human attention really does provide a good inductive bias on many attention functions in NLP. Specifically, we use estimated human attention derived from eye-tracking corpora to regularize attention functions in recurrent neural networks. We show substantial improvements across a range of tasks, including sentiment analysis, grammatical error detection, and detection of abusive language.

pdf bib
Unsupervised Induction of Linguistic Categories with Records of Reading, Speaking, and Writing
Maria Barrett | Ana Valeria González-Garduño | Lea Frermann | Anders Søgaard
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

When learning POS taggers and syntactic chunkers for low-resource languages, different resources may be available, and often all we have is a small tag dictionary, motivating type-constrained unsupervised induction. Even small dictionaries can improve the performance of unsupervised induction algorithms. This paper shows that performance can be further improved by including data that is readily available or can be easily obtained for most languages, i.e., eye-tracking, speech, or keystroke logs (or any combination thereof). We project information from all these data sources into shared spaces, in which the union of words is represented. For English unsupervised POS induction, the additional information, which is not required at test time, leads to an average error reduction on Ontonotes domains of 1.5% over systems augmented with state-of-the-art word embeddings. On Penn Treebank the best model achieves 5.4% error reduction over a word embeddings baseline. We also achieve significant improvements for syntactic chunk induction. Our analysis shows that improvements are even bigger when the available tag dictionaries are smaller.

pdf bib
Predicting misreadings from gaze in children with reading difficulties
Joachim Bingel | Maria Barrett | Sigrid Klerke
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

We present the first work on predicting reading mistakes in children with reading difficulties based on eye-tracking data from real-world reading teaching. Our approach employs several linguistic and gaze-based features to inform an ensemble of different classifiers, including multi-task learning models that let us transfer knowledge about individual readers to attain better predictions. Notably, the data we use in this work stems from noisy readings in the wild, outside of controlled lab conditions. Our experiments show that despite the noise and despite the small fraction of misreadings, gaze data improves the performance more than any other feature group and our models achieve good performance. We further show that gaze patterns for misread words do not fully generalize across readers, but that we can transfer some knowledge between readers using multitask learning at least in some cases. Applications of our models include partial automation of reading assessment as well as personalized text simplification.


pdf bib
Cross-lingual Transfer of Correlations between Parts of Speech and Gaze Features
Maria Barrett | Frank Keller | Anders Søgaard
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Several recent studies have shown that eye movements during reading provide information about grammatical and syntactic processing, which can assist the induction of NLP models. All these studies have been limited to English, however. This study shows that gaze and part of speech (PoS) correlations largely transfer across English and French. This means that we can replicate previous studies on gaze-based PoS tagging for French, but also that we can use English gaze data to assist the induction of French NLP models.

pdf bib
Extracting token-level signals of syntactic processing from fMRI - with an application to PoS induction
Joachim Bingel | Maria Barrett | Anders Søgaard
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Weakly Supervised Part-of-speech Tagging Using Eye-tracking Data
Maria Barrett | Joachim Bingel | Frank Keller | Anders Søgaard
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)


pdf bib
Reading behavior predicts syntactic categories
Maria Barrett | Anders Søgaard
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf bib
Using reading behavior to predict grammatical functions
Maria Barrett | Anders Søgaard
Proceedings of the Sixth Workshop on Cognitive Aspects of Computational Language Learning

pdf bib
Reading metrics for estimating task efficiency with MT output
Sigrid Klerke | Sheila Castilho | Maria Barrett | Anders Søgaard
Proceedings of the Sixth Workshop on Cognitive Aspects of Computational Language Learning