Experiments with ad hoc ambiguous abbreviation expansion

Agnieszka Mykowiecka, Malgorzata Marciniak


Abstract
The paper addresses experiments to expand ad hoc ambiguous abbreviations in medical notes on the basis of morphologically annotated texts, without using additional domain resources. We work on Polish data but the described approaches can be used for other languages too. We test two methods to select candidates for word abbreviation expansions. The first one automatically selects all words in text which might be an expansion of an abbreviation according to the language rules. The second method uses clustering of abbreviation occurrences to select representative elements which are manually annotated to determine lists of potential expansions. We then train a classifier to assign expansions to abbreviations based on three training sets: automatically obtained, consisting of manual annotation, and concatenation of the two previous ones. The results obtained for the manually annotated training data significantly outperform automatically obtained training data. Adding the automatically obtained training data to the manually annotated data improves the results, in particular for less frequent abbreviations. In this context the proposed a priori data driven selection of possible extensions turned out to be crucial.
Anthology ID:
D19-6207
Volume:
Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019)
Month:
November
Year:
2019
Address:
Hong Kong
Venues:
EMNLP | Louhi | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
44–53
Language:
URL:
https://www.aclweb.org/anthology/D19-6207
DOI:
10.18653/v1/D19-6207
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/D19-6207.pdf