MoNERo: a Biomedical Gold Standard Corpus for the Romanian Language

Maria Mitrofan, Verginica Barbu Mititelu, Grigorina Mitrofan


Abstract
In an era when large amounts of data are generated daily in various fields, the biomedical field among others, linguistic resources can be exploited for various tasks of Natural Language Processing. Moreover, increasing number of biomedical documents are available in languages other than English. To be able to extract information from natural language free text resources, methods and tools are needed for a variety of languages. This paper presents the creation of the MoNERo corpus, a gold standard biomedical corpus for Romanian, annotated with both part of speech tags and named entities. MoNERo comprises 154,825 morphologically annotated tokens and 23,188 entity annotations belonging to four entity semantic groups corresponding to UMLS Semantic Groups.
Anthology ID:
W19-5008
Volume:
Proceedings of the 18th BioNLP Workshop and Shared Task
Month:
August
Year:
2019
Address:
Florence, Italy
Venues:
ACL | BioNLP | WS
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
71–79
Language:
URL:
https://www.aclweb.org/anthology/W19-5008
DOI:
10.18653/v1/W19-5008
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W19-5008.pdf