BioCreative VI Precision Medicine Track: creating a training corpus for mining protein-protein interactions affected by mutations

Rezarta Islamaj Doğan, Andrew Chatr-aryamontri, Sun Kim, Chih-Hsuan Wei, Yifan Peng, Donald Comeau, Zhiyong Lu


Abstract
The Precision Medicine Track in BioCre-ative VI aims to bring together the Bi-oNLP community for a novel challenge focused on mining the biomedical litera-ture in search of mutations and protein-protein interactions (PPI). In order to support this track with an effective train-ing dataset with limited curator time, the track organizers carefully reviewed Pub-Med articles from two different sources: curated public PPI databases, and the re-sults of state-of-the-art public text mining tools. We detail here the data collection, manual review and annotation process and describe this training corpus charac-teristics. We also describe a corpus per-formance baseline. This analysis will provide useful information to developers and researchers for comparing and devel-oping innovative text mining approaches for the BioCreative VI challenge and other Precision Medicine related applica-tions.
Anthology ID:
W17-2321
Volume:
BioNLP 2017
Month:
August
Year:
2017
Address:
Vancouver, Canada,
Venues:
BioNLP | WS
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
171–175
Language:
URL:
https://www.aclweb.org/anthology/W17-2321
DOI:
10.18653/v1/W17-2321
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W17-2321.pdf