Jakub Náplava


pdf bib
Grammatical Error Correction in Low-Resource Scenarios
Jakub Náplava | Milan Straka
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

Grammatical error correction in English is a long studied problem with many existing systems and datasets. However, there has been only a limited research on error correction of other languages. In this paper, we present a new dataset AKCES-GEC on grammatical error correction for Czech. We then make experiments on Czech, German and Russian and show that when utilizing synthetic parallel corpus, Transformer neural machine translation model can reach new state-of-the-art results on these datasets. AKCES-GEC is published under CC BY-NC-SA 4.0 license at http://hdl.handle.net/11234/1-3057, and the source code of the GEC model is available at https://github.com/ufal/low-resource-gec-wnut2019.

pdf bib
CUNI System for the Building Educational Applications 2019 Shared Task: Grammatical Error Correction
Jakub Náplava | Milan Straka
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

Our submitted models are NMT systems based on the Transformer model, which we improve by incorporating several enhancements: applying dropout to whole source and target words, weighting target subwords, averaging model checkpoints, and using the trained model iteratively for correcting the intermediate translations. The system in the Restricted Track is trained on the provided corpora with oversampled “cleaner” sentences and reaches 59.39 F0.5 score on the test set. The system in the Low-Resource Track is trained from Wikipedia revision histories and reaches 44.13 F0.5 score. Finally, we finetune the system from the Low-Resource Track on restricted data and achieve 64.55 F0.5 score.


pdf bib
Diacritics Restoration Using Neural Networks
Jakub Náplava | Milan Straka | Pavel Straňák | Jan Hajič
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)