Jingshu Liu


2020

pdf bib
BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining
Zachariah Zhang | Jingshu Liu | Narges Razavian
Proceedings of the 3rd Clinical Natural Language Processing Workshop

ICD coding is the task of classifying and cod-ing all diagnoses, symptoms and proceduresassociated with a patient’s visit. The process isoften manual, extremely time-consuming andexpensive for hospitals as clinical interactionsare usually recorded in free text medical notes.In this paper, we propose a machine learningmodel, BERT-XML, for large scale automatedICD coding of EHR notes, utilizing recentlydeveloped unsupervised pretraining that haveachieved state of the art performance on a va-riety of NLP tasks. We train a BERT modelfrom scratch on EHR notes, learning with vo-cabulary better suited for EHR tasks and thusoutperform off-the-shelf models. We furtheradapt the BERT architecture for ICD codingwith multi-label attention. We demonstratethe effectiveness of BERT-based models on thelarge scale ICD code classification task usingmillions of EHR notes to predict thousands ofunique codes.

2018

pdf bib
Towards a unified framework for bilingual terminology extraction of single-word and multi-word terms
Jingshu Liu | Emmanuel Morin | Peña Saldarriaga
Proceedings of the 27th International Conference on Computational Linguistics

Extracting a bilingual terminology for multi-word terms from comparable corpora has not been widely researched. In this work we propose a unified framework for aligning bilingual terms independently of the term lengths. We also introduce some enhancements to the context-based and the neural network based approaches. Our experiments show the effectiveness of our enhancements of previous works and the system can be adapted in specialized domains.

pdf bib
Alignement de termes de longueur variable en corpus comparables spécialisés (Alignment of variable length terms in specialized comparable corpora)
Jingshu Liu | Emmanuel Morin | Sebastián Peña Saldarriaga
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

Nous proposons dans cet article une adaptation de l’approche compositionnelle étendue capable d’aligner des termes de longueurs variables à partir de corpus comparables, en modifiant la représentation des termes complexes. Nous proposons également de nouveaux modes de pondération pour l’approche standard qui améliorent les résultats des approches état de l’art pour les termes simples et complexes en domaine de spécialité.