An Evaluation of the Effect of Automatic Preprocessing on Syntactic Parsing for Biomedical Relation Extraction

Md. Faisal Mahbub Chowdhury, Alberto Lavelli


Abstract
Relation extraction (RE) is an important text mining task which is the basis for further complex and advanced tasks. In state-of-the-art RE approaches, syntactic information obtained through parsing plays a crucial role. In the context of biomedical RE previous studies report usage of various automatic preprocessing techniques applied before parsing the input text. However, these studies do not specify to what extent such techniques improve RE results and to what extent they are corpus specific as well as parser specific. In this paper, we aim at addressing these issues by using various preprocessing techniques, two syntactic tree kernel based RE approaches and two different parsers on 5 widely used benchmark biomedical corpora of the protein-protein interaction (PPI) extraction task. We also provide analyses of various corpus characteristics to verify whether there are correlations between these characteristics and the RE results obtained. These analyses of corpus characteristics can be exploited to compare the 5 PPI corpora.
Anthology ID:
L12-1290
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
544–551
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/522_Paper.pdf
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/522_Paper.pdf