Artem Chernodub


2020

pdf bib
GECToR – Grammatical Error Correction: Tag, Not Rewrite
Kostiantyn Omelianchuk | Vitaliy Atrasevych | Artem Chernodub | Oleksandr Skurzhanskyi
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this paper, we present a simple and efficient GEC sequence tagger using a Transformer encoder. Our system is pre-trained on synthetic data and then fine-tuned in two stages: first on errorful corpora, and second on a combination of errorful and error-free parallel corpora. We design custom token-level transformations to map input tokens to target corrections. Our best single-model/ensemble GEC tagger achieves an F_0.5 of 65.3/66.5 on CONLL-2014 (test) and F_0.5 of 72.4/73.6 on BEA-2019 (test). Its inference speed is up to 10 times as fast as a Transformer-based seq2seq GEC system.

2019

pdf bib
TARGER: Neural Argument Mining at Your Fingertips
Artem Chernodub | Oleksiy Oliynyk | Philipp Heidenreich | Alexander Bondarenko | Matthias Hagen | Chris Biemann | Alexander Panchenko
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present TARGER, an open source neural argument mining framework for tagging arguments in free input texts and for keyword-based retrieval of arguments from an argument-tagged web-scale corpus. The currently available models are pre-trained on three recent argument mining datasets and enable the use of neural argument mining without any reproducibility effort on the user’s side. The open source code ensures portability to other domains and use cases.

pdf bib
RNN Embeddings for Identifying Difficult to Understand Medical Words
Hanna Pylieva | Artem Chernodub | Natalia Grabar | Thierry Hamon
Proceedings of the 18th BioNLP Workshop and Shared Task

Patients and their families often require a better understanding of medical information provided by doctors. We currently address this issue by improving the identification of difficult to understand medical words. We introduce novel embeddings received from RNN - FrnnMUTE (French RNN Medical Understandability Text Embeddings) which allow to reach up to 87.0 F1 score in identification of difficult words. We also note that adding pre-trained FastText word embeddings to the feature set substantially improves the performance of the model which classifies words according to their difficulty. We study the generalizability of different models through three cross-validation scenarios which allow testing classifiers in real-world conditions: understanding of medical words by new users, and classification of new unseen words by the automatic models. The RNN - FrnnMUTE embeddings and the categorization code are being made available for the research.