Effective Use of Target-side Context for Neural Machine Translation

Hideya Mino, Hitoshi Ito, Isao Goto, Ichiro Yamada, Takenobu Tokunaga


Abstract
In this paper, we deal with two problems in Japanese-English machine translation of news articles. The first problem is the quality of parallel corpora. Neural machine translation (NMT) systems suffer degraded performance when trained with noisy data. Because there is no clean Japanese-English parallel data for news articles, we build a novel parallel news corpus consisting of Japanese news articles translated into English in a content-equivalent manner. This is the first content-equivalent Japanese-English news corpus translated specifically for training NMT systems. The second problem involves the domain-adaptation technique. NMT systems suffer degraded performance when trained with mixed data having different features, such as noisy data and clean data. Though the existing methods try to overcome this problem by using tags for distinguishing the differences between corpora, it is not sufficient. We thus extend a domain-adaptation method using multi-tags to train an NMT model effectively with the clean corpus and existing parallel news corpora with some types of noise. Experimental results show that our corpus increases the translation quality, and that our domain-adaptation method is more effective for learning with the multiple types of corpora than existing domain-adaptation methods are.
Anthology ID:
2020.coling-main.396
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
4483–4494
Language:
URL:
https://www.aclweb.org/anthology/2020.coling-main.396
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.coling-main.396.pdf