Query selection methods for automated corpora construction with a use case in food-drug interactions

Georgeta Bordea, Tsanta Randriatsitohaina, Fleur Mougin, Natalia Grabar, Thierry Hamon


Abstract
In this paper, we address the problem of automatically constructing a relevant corpus of scientific articles about food-drug interactions. There is a growing number of scientific publications that describe food-drug interactions but currently building a high-coverage corpus that can be used for information extraction purposes is not trivial. We investigate several methods for automating the query selection process using an expert-curated corpus of food-drug interactions. Our experiments show that index term features along with a decision tree classifier are the best approach for this task and that feature selection approaches and in particular gain ratio outperform frequency-based methods for query selection.
Anthology ID:
W19-5013
Volume:
Proceedings of the 18th BioNLP Workshop and Shared Task
Month:
August
Year:
2019
Address:
Florence, Italy
Venues:
ACL | BioNLP | WS
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
115–124
Language:
URL:
https://www.aclweb.org/anthology/W19-5013
DOI:
10.18653/v1/W19-5013
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W19-5013.pdf