Discourse-Wide Extraction of Assay Frames from the Biological Literature

Dayne Freitag, Paul Kalmar, Eric Yeh


Abstract
We consider the problem of populating multi-part knowledge frames from textual information distributed over multiple sentences in a document. We present a corpus constructed by aligning papers from the cellular signaling literature to a collection of approximately 50,000 reference frames curated by hand as part of a decade-long project. We present and evaluate two approaches to the challenging problem of reconstructing these frames, which formalize biological assays described in the literature. One approach is based on classifying candidate records nominated by sentence-local entity co-occurrence. In the second approach, we introduce a novel virtual register machine traverses an article and generates frames, trained on our reference data. Our evaluations show that success in the task ultimately hinges on an integration of evidence spread across the discourse.
Anthology ID:
W17-8003
Volume:
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Venues:
RANLP | WS
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
15–23
Language:
URL:
https://doi.org/10.26615/978-954-452-044-1_003
DOI:
10.26615/978-954-452-044-1_003
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://doi.org/10.26615/978-954-452-044-1_003