Iva Marinova


2020

pdf bib
Reconstructing NER Corpora: a Case Study on Bulgarian
Iva Marinova | Laska Laskova | Petya Osenova | Kiril Simov | Alexander Popov
Proceedings of the 12th Language Resources and Evaluation Conference

The paper reports on the usage of deep learning methods for improving a Named Entity Recognition (NER) training corpus and for predicting and annotating new types in a test corpus. We show how the annotations in a type-based corpus of named entities (NE) were populated as occurrences within it, thus ensuring density of the training information. A deep learning model was adopted for discovering inconsistencies in the initial annotation and for learning new NE types. The evaluation results get improved after data curation, randomization and deduplication.

2019

pdf bib
Evaluation of Stacked Embeddings for Bulgarian on the Downstream Tasks POS and NERC
Iva Marinova
Proceedings of the Student Research Workshop Associated with RANLP 2019

This paper reports on experiments with different stacks of word embeddings and evaluation of their usefulness for Bulgarian downstream tasks such as Named Entity Recognition and Classification (NERC) and Part-of-speech (POS) Tagging. Word embeddings stay in the core of the development of NLP, with several key language models being created over the last two years like FastText (CITATION), ElMo (CITATION), BERT (CITATION) and Flair (CITATION). Stacking or combining different word embeddings is another technique used in this paper and still not reported for Bulgarian NERC. Well-established architecture is used for the sequence tagging task such as BI-LSTM-CRF, and different pre-trained language models are combined in the embedding layer to decide which combination of them scores better.