Dataset Mention Extraction and Classification

Animesh Prasad, Chenglei Si, Min-Yen Kan


Abstract
Datasets are integral artifacts of empirical scientific research. However, due to natural language variation, their recognition can be difficult and even when identified, can often be inconsistently referred across and within publications. We report our approach to the Coleridge Initiative’s Rich Context Competition, which tasks participants with identifying dataset surface forms (dataset mention extraction) and associating the extracted mention to its referred dataset (dataset classification). In this work, we propose various neural baselines and evaluate these model on one-plus and zero-shot classification scenarios. We further explore various joint learning approaches - exploring the synergy between the tasks - and report the issues with such techniques.
Anthology ID:
W19-2604
Volume:
Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Venues:
NAACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
31–36
Language:
URL:
https://www.aclweb.org/anthology/W19-2604
DOI:
10.18653/v1/W19-2604
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W19-2604.pdf