A guide to the dataset explosion in QA, NLI, and commonsense reasoning

Anna Rogers, Anna Rumshisky


Abstract
Question answering, natural language inference and commonsense reasoning are increasingly popular as general NLP system benchmarks, driving both modeling and dataset work. Only for question answering we already have over 100 datasets, with over 40 published after 2018. However, most new datasets get “solved” soon after publication, and this is largely due not to the verbal reasoning capabilities of our models, but to annotation artifacts and shallow cues in the data that they can exploit. This tutorial aims to (1) provide an up-to-date guide to the recent datasets, (2) survey the old and new methodological issues with dataset construction, and (3) outline the existing proposals for overcoming them. The target audience is the NLP practitioners who are lost in dozens of the recent datasets, and would like to know what these datasets are actually measuring. Our overview of the problems with the current datasets and the latest tips and tricks for overcoming them will also be useful to the researchers working on future benchmarks.
Anthology ID:
2020.coling-tutorials.5
Volume:
Proceedings of the 28th International Conference on Computational Linguistics: Tutorial Abstracts
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venue:
COLING
SIG:
Publisher:
International Committee for Computational Linguistics
Note:
Pages:
27–32
Language:
URL:
https://www.aclweb.org/anthology/2020.coling-tutorials.5
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.coling-tutorials.5.pdf