The BreakingNews Dataset

Arnau Ramisa, Fei Yan, Francesc Moreno-Noguer, Krystian Mikolajczyk


Abstract
We present BreakingNews, a novel dataset with approximately 100K news articles including images, text and captions, and enriched with heterogeneous meta-data (e.g. GPS coordinates and popularity metrics). The tenuous connection between the images and text in news data is appropriate to take work at the intersection of Computer Vision and Natural Language Processing to the next step, hence we hope this dataset will help spur progress in the field.
Anthology ID:
W17-2005
Volume:
Proceedings of the Sixth Workshop on Vision and Language
Month:
April
Year:
2017
Address:
Valencia, Spain
Venues:
VL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
38–39
Language:
URL:
https://www.aclweb.org/anthology/W17-2005
DOI:
10.18653/v1/W17-2005
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W17-2005.pdf