BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining

Zachariah Zhang, Jingshu Liu, Narges Razavian


Abstract
ICD coding is the task of classifying and cod-ing all diagnoses, symptoms and proceduresassociated with a patient’s visit. The process isoften manual, extremely time-consuming andexpensive for hospitals as clinical interactionsare usually recorded in free text medical notes.In this paper, we propose a machine learningmodel, BERT-XML, for large scale automatedICD coding of EHR notes, utilizing recentlydeveloped unsupervised pretraining that haveachieved state of the art performance on a va-riety of NLP tasks. We train a BERT modelfrom scratch on EHR notes, learning with vo-cabulary better suited for EHR tasks and thusoutperform off-the-shelf models. We furtheradapt the BERT architecture for ICD codingwith multi-label attention. We demonstratethe effectiveness of BERT-based models on thelarge scale ICD code classification task usingmillions of EHR notes to predict thousands ofunique codes.
Anthology ID:
2020.clinicalnlp-1.3
Volume:
Proceedings of the 3rd Clinical Natural Language Processing Workshop
Month:
November
Year:
2020
Address:
Online
Venues:
ClinicalNLP | EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24–34
Language:
URL:
https://www.aclweb.org/anthology/2020.clinicalnlp-1.3
DOI:
10.18653/v1/2020.clinicalnlp-1.3
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.clinicalnlp-1.3.pdf