Filtering Back-Translated Data in Unsupervised Neural Machine Translation

Jyotsana Khatri, Pushpak Bhattacharyya


Abstract
Unsupervised neural machine translation (NMT) utilizes only monolingual data for training. The quality of back-translated data plays an important role in the performance of NMT systems. In back-translation, all generated pseudo parallel sentence pairs are not of the same quality. Taking inspiration from domain adaptation where in-domain sentences are given more weight in training, in this paper we propose an approach to filter back-translated data as part of the training process of unsupervised NMT. Our approach gives more weight to good pseudo parallel sentence pairs in the back-translation phase. We calculate the weight of each pseudo parallel sentence pair using sentence-wise round-trip BLEU score which is normalized batch-wise. We compare our approach with the current state of the art approaches for unsupervised NMT.
Anthology ID:
2020.coling-main.383
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
4334–4339
Language:
URL:
https://www.aclweb.org/anthology/2020.coling-main.383
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.coling-main.383.pdf