Document retrieval and question answering in medical documents. A large-scale corpus challenge.

Curea Eric


Abstract
Whenever employed on large datasets, information retrieval works by isolating a subset of documents from the larger dataset and then proceeding with low-level processing of the text. This is usually carried out by means of adding index-terms to each document in the collection. In this paper we deal with automatic document classification and index-term detection applied on large-scale medical corpora. In our methodology we employ a linear classifier and we test our results on the BioASQ training corpora, which is a collection of 12 million MeSH-indexed medical abstracts. We cover both term-indexing, result retrieval and result ranking based on distributed word representations.
Anthology ID:
W17-8001
Volume:
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Venues:
RANLP | WS
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
1–7
Language:
URL:
https://doi.org/10.26615/978-954-452-044-1_001
DOI:
10.26615/978-954-452-044-1_001
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://doi.org/10.26615/978-954-452-044-1_001