BioReddit: Word Embeddings for User-Generated Biomedical NLP

Marco Basaldella, Nigel Collier


Abstract
Word embeddings, in their different shapes and iterations, have changed the natural language processing research landscape in the last years. The biomedical text processing field is no stranger to this revolution; however, scholars in the field largely trained their embeddings on scientific documents only, even when working on user-generated data. In this paper we show how training embeddings from a corpus collected from user-generated text from medical forums heavily influences the performance on downstream tasks, outperforming embeddings trained both on general purpose data or on scientific papers when applied on user-generated content.
Anthology ID:
D19-6205
Volume:
Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019)
Month:
November
Year:
2019
Address:
Hong Kong
Venues:
EMNLP | Louhi | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
34–38
Language:
URL:
https://www.aclweb.org/anthology/D19-6205
DOI:
10.18653/v1/D19-6205
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/D19-6205.pdf