Evaluation of a Sequence Tagging Tool for Biomedical Texts

Julien Tourille, Matthieu Doutreligne, Olivier Ferret, Aurélie Névéol, Nicolas Paris, Xavier Tannier


Abstract
Many applications in biomedical natural language processing rely on sequence tagging as an initial step to perform more complex analysis. To support text analysis in the biomedical domain, we introduce Yet Another SEquence Tagger (YASET), an open-source multi purpose sequence tagger that implements state-of-the-art deep learning algorithms for sequence tagging. Herein, we evaluate YASET on part-of-speech tagging and named entity recognition in a variety of text genres including articles from the biomedical literature in English and clinical narratives in French. To further characterize performance, we report distributions over 30 runs and different sizes of training datasets. YASET provides state-of-the-art performance on the CoNLL 2003 NER dataset (F1=0.87), MEDPOST corpus (F1=0.97), MERLoT corpus (F1=0.99) and NCBI disease corpus (F1=0.81). We believe that YASET is a versatile and efficient tool that can be used for sequence tagging in biomedical and clinical texts.
Anthology ID:
W18-5622
Volume:
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis
Month:
October
Year:
2018
Address:
Brussels, Belgium
Venues:
EMNLP | Louhi | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
193–203
Language:
URL:
https://www.aclweb.org/anthology/W18-5622
DOI:
10.18653/v1/W18-5622
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W18-5622.pdf