Minimally-Augmented Grammatical Error Correction
Roman Grundkiewicz, Marcin Junczys-Dowmunt
Abstract
There has been an increased interest in low-resource approaches to automatic grammatical error correction. We introduce Minimally-Augmented Grammatical Error Correction (MAGEC) that does not require any error-labelled data. Our unsupervised approach is based on a simple but effective synthetic error generation method based on confusion sets from inverted spell-checkers. In low-resource settings, we outperform the current state-of-the-art results for German and Russian GEC tasks by a large margin without using any real error-annotated training data. When combined with labelled data, our method can serve as an efficient pre-training technique- Anthology ID:
- D19-5546
- Volume:
- Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Venues:
- EMNLP | WNUT | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 357–363
- Language:
- URL:
- https://www.aclweb.org/anthology/D19-5546
- DOI:
- 10.18653/v1/D19-5546
- PDF:
- http://aclanthology.lst.uni-saarland.de/D19-5546.pdf