Tackling Biomedical Text Summarization: OAQA at BioASQ 5B

Khyathi Chandu, Aakanksha Naik, Aditya Chandrasekar, Zi Yang, Niloy Gupta, Eric Nyberg


Abstract
In this paper, we describe our participation in phase B of task 5b of the fifth edition of the annual BioASQ challenge, which includes answering factoid, list, yes-no and summary questions from biomedical data. We describe our techniques with an emphasis on ideal answer generation, where the goal is to produce a relevant, precise, non-redundant, query-oriented summary from multiple relevant documents. We make use of extractive summarization techniques to address this task and experiment with different biomedical ontologies and various algorithms including agglomerative clustering, Maximum Marginal Relevance (MMR) and sentence compression. We propose a novel word embedding based tf-idf similarity metric and a soft positional constraint which improve our system performance. We evaluate our techniques on test batch 4 from the fourth edition of the challenge. Our best system achieves a ROUGE-2 score of 0.6534 and ROUGE-SU4 score of 0.6536.
Anthology ID:
W17-2307
Volume:
BioNLP 2017
Month:
August
Year:
2017
Address:
Vancouver, Canada,
Venues:
BioNLP | WS
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
58–66
Language:
URL:
https://www.aclweb.org/anthology/W17-2307
DOI:
10.18653/v1/W17-2307
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W17-2307.pdf