The impact of simple feature engineering in multilingual medical NER

Rebecka Weegar, Arantza Casillas, Arantza Diaz de Ilarraza, Maite Oronoz, Alicia Pérez, Koldo Gojenola


Abstract
The goal of this paper is to examine the impact of simple feature engineering mechanisms before applying more sophisticated techniques to the task of medical NER. Sometimes papers using scientifically sound techniques present raw baselines that could be improved adding simple and cheap features. This work focuses on entity recognition for the clinical domain for three languages: English, Swedish and Spanish. The task is tackled using simple features, starting from the window size, capitalization, prefixes, and moving to POS and semantic tags. This work demonstrates that a simple initial step of feature engineering can improve the baseline results significantly. Hence, the contributions of this paper are: first, a short list of guidelines well supported with experimental results on three languages and, second, a detailed description of the relevance of these features for medical NER.
Anthology ID:
W16-4201
Volume:
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)
Month:
December
Year:
2016
Address:
Osaka, Japan
Venues:
ClinicalNLP | WS
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
1–6
Language:
URL:
https://www.aclweb.org/anthology/W16-4201
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W16-4201.pdf