Transfer learning applied to text classification in Spanish radiological reports

Pilar López Úbeda, Manuel Carlos Díaz-Galiano, L. Alfonso Urena Lopez, Maite Martin, Teodoro Martín-Noguerol, Antonio Luna


Abstract
Pre-trained text encoders have rapidly advanced the state-of-the-art on many Natural Language Processing tasks. This paper presents the use of transfer learning methods applied to the automatic detection of codes in radiological reports in Spanish. Assigning codes to a clinical document is a popular task in NLP and in the biomedical domain. These codes can be of two types: standard classifications (e.g. ICD-10) or specific to each clinic or hospital. In this study we show a system using specific radiology clinic codes. The dataset is composed of 208,167 radiology reports labeled with 89 different codes. The corpus has been evaluated with three methods using the BERT model applied to Spanish: Multilingual BERT, BETO and XLM. The results are interesting obtaining 70% of F1-score with a pre-trained multilingual model.
Anthology ID:
2020.multilingualbio-1.5
Volume:
Proceedings of the LREC 2020 Workshop on Multilingual Biomedical Text Processing (MultilingualBIO 2020)
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
LREC | MultilingualBIO | WS
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
29–32
Language:
English
URL:
https://www.aclweb.org/anthology/2020.multilingualbio-1.5
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.multilingualbio-1.5.pdf