GECToR – Grammatical Error Correction: Tag, Not Rewrite

Kostiantyn Omelianchuk, Vitaliy Atrasevych, Artem Chernodub, Oleksandr Skurzhanskyi


Abstract
In this paper, we present a simple and efficient GEC sequence tagger using a Transformer encoder. Our system is pre-trained on synthetic data and then fine-tuned in two stages: first on errorful corpora, and second on a combination of errorful and error-free parallel corpora. We design custom token-level transformations to map input tokens to target corrections. Our best single-model/ensemble GEC tagger achieves an F_0.5 of 65.3/66.5 on CONLL-2014 (test) and F_0.5 of 72.4/73.6 on BEA-2019 (test). Its inference speed is up to 10 times as fast as a Transformer-based seq2seq GEC system.
Anthology ID:
2020.bea-1.16
Volume:
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications
Month:
July
Year:
2020
Address:
Seattle, WA, USA → Online
Venues:
ACL | BEA | WS
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
163–170
Language:
URL:
https://www.aclweb.org/anthology/2020.bea-1.16
DOI:
10.18653/v1/2020.bea-1.16
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.bea-1.16.pdf