Inanc Birol


pdf bib
In-domain Context-aware Token Embeddings Improve Biomedical Named Entity Recognition
Golnar Sheikhshabbafghi | Inanc Birol | Anoop Sarkar
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis

Rapidly expanding volume of publications in the biomedical domain makes it increasingly difficult for a timely evaluation of the latest literature. That, along with a push for automated evaluation of clinical reports, present opportunities for effective natural language processing methods. In this study we target the problem of named entity recognition, where texts are processed to annotate terms that are relevant for biomedical studies. Terms of interest in the domain include gene and protein names, and cell lines and types. Here we report on a pipeline built on Embeddings from Language Models (ELMo) and a deep learning package for natural language processing (AllenNLP). We trained context-aware token embeddings on a dataset of biomedical papers using ELMo, and incorporated these embeddings in the LSTM-CRF model used by AllenNLP for named entity recognition. We show these representations improve named entity recognition for different types of biomedical named entities. We also achieve a new state of the art in gene mention detection on the BioCreative II gene mention shared task.


pdf bib
Graph-based Semi-supervised Gene Mention Tagging
Golnar Sheikhshab | Elizabeth Starks | Aly Karsan | Anoop Sarkar | Inanc Birol
Proceedings of the 15th Workshop on Biomedical Natural Language Processing