Maria Ponomareva


2019

pdf bib
Char-RNN for Word Stress Detection in East Slavic Languages
Ekaterina Chernyak | Maria Ponomareva | Kirill Milintsevich
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects

We explore how well a sequence labeling approach, namely, recurrent neural network, is suited for the task of resource-poor and POS tagging free word stress detection in the Russian, Ukranian, Belarusian languages. We present new datasets, annotated with the word stress, for the three languages and compare several RNN models trained on three languages and explore possible applications of the transfer learning for the task. We show that it is possible to train a model in a cross-lingual setting and that using additional languages improves the quality of the results.

pdf bib
AGRR 2019: Corpus for Gapping Resolution in Russian
Maria Ponomareva | Kira Droganova | Ivan Smurov | Tatiana Shavrina
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing

This paper provides a comprehensive overview of the gapping dataset for Russian that consists of 7.5k sentences with gapping (as well as 15k relevant negative sentences) and comprises data from various genres: news, fiction, social media and technical texts. The dataset was prepared for the Automatic Gapping Resolution Shared Task for Russian (AGRR-2019) - a competition aimed at stimulating the development of NLP tools and methods for processing of ellipsis. In this paper, we pay special attention to the gapping resolution methods that were introduced within the shared task as well as an alternative test set that illustrates that our corpus is a diverse and representative subset of Russian language gapping sufficient for effective utilization of machine learning techniques.

2017

pdf bib
Automated Word Stress Detection in Russian
Maria Ponomareva | Kirill Milintsevich | Ekaterina Chernyak | Anatoly Starostin
Proceedings of the First Workshop on Subword and Character Level Models in NLP

In this study we address the problem of automated word stress detection in Russian using character level models and no part-speech-taggers. We use a simple bidirectional RNN with LSTM nodes and achieve accuracy of 90% or higher. We experiment with two training datasets and show that using the data from an annotated corpus is much more efficient than using only a dictionary, since it allows to retain the context of the word and its morphological features.