Mark Yatskar


2020

pdf bib
What Does BERT with Vision Look At?
Liunian Harold Li | Mark Yatskar | Da Yin | Cho-Jui Hsieh | Kai-Wei Chang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Pre-trained visually grounded language models such as ViLBERT, LXMERT, and UNITER have achieved significant performance improvement on vision-and-language tasks but what they learn during pre-training remains unclear. In this work, we demonstrate that certain attention heads of a visually grounded language model actively ground elements of language to image regions. Specifically, some heads can map entities to image regions, performing the task known as entity grounding. Some heads can even detect the syntactic relations between non-entity words and image regions, tracking, for example, associations between verbs and regions corresponding to their arguments. We denote this ability as syntactic grounding. We verify grounding both quantitatively and qualitatively, using Flickr30K Entities as a testbed.

pdf bib
Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles
Christopher Clark | Mark Yatskar | Luke Zettlemoyer
Findings of the Association for Computational Linguistics: EMNLP 2020

Many datasets have been shown to contain incidental correlations created by idiosyncrasies in the data collection process. For example, sentence entailment datasets can have spurious word-class correlations if nearly all contradiction sentences contain the word “not”, and image recognition datasets can have tell-tale object-background correlations if dogs are always indoors. In this paper, we propose a method that can automatically detect and ignore these kinds of dataset-specific patterns, which we call dataset biases. Our method trains a lower capacity model in an ensemble with a higher capacity model. During training, the lower capacity model learns to capture relatively shallow correlations, which we hypothesize are likely to reflect dataset bias. This frees the higher capacity model to focus on patterns that should generalize better. We ensure the models learn non-overlapping approaches by introducing a novel method to make them conditionally independent. Importantly, our approach does not require the bias to be known in advance. We evaluate performance on synthetic datasets, and four datasets built to penalize models that exploit known biases on textual entailment, visual question answering, and image recognition tasks. We show improvement in all settings, including a 10 point gain on the visual question answering dataset.

2019

pdf bib
Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases
Christopher Clark | Mark Yatskar | Luke Zettlemoyer
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

State-of-the-art models often make use of superficial patterns in the data that do not generalize well to out-of-domain or adversarial settings. For example, textual entailment models often learn that particular key words imply entailment, irrespective of context, and visual question answering models learn to predict prototypical answers, without considering evidence in the image. In this paper, we show that if we have prior knowledge of such biases, we can train a model to be more robust to domain shift. Our method has two stages: we (1) train a naive model that makes predictions exclusively based on dataset biases, and (2) train a robust model as part of an ensemble with the naive one in order to encourage it to focus on other patterns in the data that are more likely to generalize. Experiments on five datasets with out-of-domain test sets show significantly improved robustness in all settings, including a 12 point gain on a changing priors visual question answering dataset and a 9 point gain on an adversarial question answering test set.

pdf bib
Gender Bias in Contextualized Word Embeddings
Jieyu Zhao | Tianlu Wang | Mark Yatskar | Ryan Cotterell | Vicente Ordonez | Kai-Wei Chang
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In this paper, we quantify, analyze and mitigate gender bias exhibited in ELMo’s contextualized word vectors. First, we conduct several intrinsic analyses and find that (1) training data for ELMo contains significantly more male than female entities, (2) the trained ELMo embeddings systematically encode gender information and (3) ELMo unequally encodes gender information about male and female entities. Then, we show that a state-of-the-art coreference system that depends on ELMo inherits its bias and demonstrates significant bias on the WinoBias probing corpus. Finally, we explore two methods to mitigate such gender bias and show that the bias demonstrated on WinoBias can be eliminated.

pdf bib
A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC
Mark Yatskar
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We compare three new datasets for question answering: SQuAD 2.0, QuAC, and CoQA, along several of their new features: (1) unanswerable questions, (2) multi-turn interactions, and (3) abstractive answers.We show that the datasets provide complementary coverage of the first two aspects, but weak coverage of the third.Because of the datasets’ structural similarity, a single extractive model can be easily adapted to any of the datasets and we show improved baseline results on both SQuAD 2.0 and CoQA. Despite the similarity, models trained on one dataset are ineffective on another dataset, but we find moderate performance improvement through pretraining. To encourage cross-evaluation, we release code for conversion between datasets.

2018

pdf bib
Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods
Jieyu Zhao | Tianlu Wang | Mark Yatskar | Vicente Ordonez | Kai-Wei Chang
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

In this paper, we introduce a new benchmark for co-reference resolution focused on gender bias, WinoBias. Our corpus contains Winograd-schema style sentences with entities corresponding to people referred by their occupation (e.g. the nurse, the doctor, the carpenter). We demonstrate that a rule-based, a feature-rich, and a neural coreference system all link gendered pronouns to pro-stereotypical entities with higher accuracy than anti-stereotypical entities, by an average difference of 21.1 in F1 score. Finally, we demonstrate a data-augmentation approach that, in combination with existing word-embedding debiasing techniques, removes the bias demonstrated by these systems in WinoBias without significantly affecting their performance on existing datasets.

pdf bib
QuAC: Question Answering in Context
Eunsol Choi | He He | Mohit Iyyer | Mark Yatskar | Wen-tau Yih | Yejin Choi | Percy Liang | Luke Zettlemoyer
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We present QuAC, a dataset for Question Answering in Context that contains 14K information-seeking QA dialogs (100K questions in total). The dialogs involve two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts from the text. QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context, as we show in a detailed qualitative evaluation. We also report results for a number of reference models, including a recently state-of-the-art reading comprehension architecture extended to model dialog context. Our best model underperforms humans by 20 F1, suggesting that there is significant room for future work on this data. Dataset, baseline, and leaderboard available at http://quac.ai.

pdf bib
Proceedings of the Workshop on Generalization in the Age of Deep Learning
Yonatan Bisk | Omer Levy | Mark Yatskar
Proceedings of the Workshop on Generalization in the Age of Deep Learning

2017

pdf bib
Neural AMR: Sequence-to-Sequence Models for Parsing and Generation
Ioannis Konstas | Srinivasan Iyer | Mark Yatskar | Yejin Choi | Luke Zettlemoyer
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Sequence-to-sequence models have shown strong performance across a broad range of applications. However, their application to parsing and generating text using Abstract Meaning Representation (AMR) has been limited, due to the relatively limited amount of labeled data and the non-sequential nature of the AMR graphs. We present a novel training procedure that can lift this limitation using millions of unlabeled sentences and careful preprocessing of the AMR graphs. For AMR parsing, our model achieves competitive results of 62.1 SMATCH, the current best score reported without significant use of external semantic resources. For AMR generation, our model establishes a new state-of-the-art performance of BLEU 33.8. We present extensive ablative and qualitative analysis including strong evidence that sequence-based AMR models are robust against ordering variations of graph-to-sequence conversions.

pdf bib
Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
Jieyu Zhao | Tianlu Wang | Mark Yatskar | Vicente Ordonez | Kai-Wei Chang
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Language is increasingly being used to de-fine rich visual recognition problems with supporting image collections sourced from the web. Structured prediction models are used in these tasks to take advantage of correlations between co-occurring labels and visual input but risk inadvertently encoding social biases found in web corpora. In this work, we study data and models associated with multilabel object classification and visual semantic role labeling. We find that (a) datasets for these tasks contain significant gender bias and (b) models trained on these datasets further amplify existing bias. For example, the activity cooking is over 33% more likely to involve females than males in a training set, and a trained model further amplifies the disparity to 68% at test time. We propose to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for collective inference. Our method results in almost no performance loss for the underlying recognition task but decreases the magnitude of bias amplification by 47.5% and 40.5% for multilabel classification and visual semantic role labeling, respectively。

2016

pdf bib
Stating the Obvious: Extracting Visual Common Sense Knowledge
Mark Yatskar | Vicente Ordonez | Ali Farhadi
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
See No Evil, Say No Evil: Description Generation from Densely Labeled Images
Mark Yatskar | Michel Galley | Lucy Vanderwende | Luke Zettlemoyer
Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)

2013

pdf bib
Learning to Relate Literal and Sentimental Descriptions of Visual Properties
Mark Yatskar | Svitlana Volkova | Asli Celikyilmaz | Bill Dolan | Luke Zettlemoyer
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia
Mark Yatskar | Bo Pang | Cristian Danescu-Niculescu-Mizil | Lillian Lee
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics