Multilingual Whispers: Generating Paraphrases with Translation
Christian Federmann, Oussama Elachqar, Chris Quirk
Abstract
Naturally occurring paraphrase data, such as multiple news stories about the same event, is a useful but rare resource. This paper compares translation-based paraphrase gathering using human, automatic, or hybrid techniques to monolingual paraphrasing by experts and non-experts. We gather translations, paraphrases, and empirical human quality assessments of these approaches. Neural machine translation techniques, especially when pivoting through related languages, provide a relatively robust source of paraphrases with diversity comparable to expert human paraphrases. Surprisingly, human translators do not reliably outperform neural systems. The resulting data release will not only be a useful test set, but will also allow additional explorations in translation and paraphrase quality assessments and relationships.- Anthology ID:
- D19-5503
- Volume:
- Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Venues:
- EMNLP | WNUT | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 17–26
- Language:
- URL:
- https://www.aclweb.org/anthology/D19-5503
- DOI:
- 10.18653/v1/D19-5503
- PDF:
- http://aclanthology.lst.uni-saarland.de/D19-5503.pdf