Easy, Reproducible and Quality-Controlled Data Collection with CROWDAQ

Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, Robert L. Logan IV, Ana Marasović, Zhen Nie


Abstract
High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce CROWDAQ, an open-source platform that standardizes the data collection pipeline with customizable user-interface components, automated annotator qualification, and saved pipelines in a re-usable format. We show that CROWDAQ simplifies data annotation significantly on a diverse set of data collection use cases and we hope it will be a convenient tool for the community.
Anthology ID:
2020.emnlp-demos.17
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:
October
Year:
2020
Address:
Online
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
127–134
Language:
URL:
https://www.aclweb.org/anthology/2020.emnlp-demos.17
DOI:
10.18653/v1/2020.emnlp-demos.17
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.emnlp-demos.17.pdf
Optional supplementary material:
 2020.emnlp-demos.17.OptionalSupplementaryMaterial.pdf