Towards Confidence Estimation for Typed Protein-Protein Relation Extraction

Camilo Thorne, Roman Klinger


Abstract
Systems which build on top of information extraction are typically challenged to extract knowledge that, while correct, is not yet well-known. We hypothesize that a good confidence measure for relational information has the property that such interesting information is found between information extracted with very high confidence and very low confidence. We discuss confidence estimation for the domain of biomedical protein-protein relation discovery in biomedical literature. As facts reported in papers take some time to be validated and recorded in biomedical databases, such task gives rise to large quantities of unknown but potentially true candidate relations. It is thus important to rank them based on supporting evidence rather than discard them. In this paper, we discuss this task and propose different approaches for confidence estimation and a pipeline to evaluate such methods. We show that the most straight-forward approach, a combination of different confidence measures from pipeline modules seems not to work well. We discuss this negative result and pinpoint potential future research directions.
Anthology ID:
W17-8008
Volume:
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Venues:
RANLP | WS
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
55–63
Language:
URL:
https://doi.org/10.26615/978-954-452-044-1_008
DOI:
10.26615/978-954-452-044-1_008
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://doi.org/10.26615/978-954-452-044-1_008