Reconstructing NER Corpora: a Case Study on Bulgarian
Iva Marinova, Laska Laskova, Petya Osenova, Kiril Simov, Alexander Popov
Abstract
The paper reports on the usage of deep learning methods for improving a Named Entity Recognition (NER) training corpus and for predicting and annotating new types in a test corpus. We show how the annotations in a type-based corpus of named entities (NE) were populated as occurrences within it, thus ensuring density of the training information. A deep learning model was adopted for discovering inconsistencies in the initial annotation and for learning new NE types. The evaluation results get improved after data curation, randomization and deduplication.- Anthology ID:
- 2020.lrec-1.571
- Volume:
- Proceedings of the 12th Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Venues:
- COLING | LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 4647–4652
- Language:
- English
- URL:
- https://www.aclweb.org/anthology/2020.lrec-1.571
- DOI:
- PDF:
- http://aclanthology.lst.uni-saarland.de/2020.lrec-1.571.pdf