Unsupervised Paraphrasing without Translation

Aurko Roy, David Grangier


Abstract
Paraphrasing is an important task demonstrating the ability to abstract semantic content from its surface form. Recent literature on automatic paraphrasing is dominated by methods leveraging machine translation as an intermediate step. This contrasts with humans, who can paraphrase without necessarily being bilingual. This work proposes to learn paraphrasing models only from a monolingual corpus. To that end, we propose a residual variant of vector-quantized variational auto-encoder. Our experiments consider paraphrase identification, and paraphrasing for training set augmentation, comparing to supervised and unsupervised translation-based approaches. Monolingual paraphrasing is shown to outperform unsupervised translation in all contexts. The comparison with supervised MT is more mixed: monolingual paraphrasing is interesting for identification and augmentation but supervised MT is superior for generation.
Anthology ID:
P19-1605
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6033–6039
Language:
URL:
https://www.aclweb.org/anthology/P19-1605
DOI:
10.18653/v1/P19-1605
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/P19-1605.pdf