Improving Autoregressive NMT with Non-Autoregressive Model

Long Zhou, Jiajun Zhang, Chengqing Zong


Abstract
Autoregressive neural machine translation (NMT) models are often used to teach non-autoregressive models via knowledge distillation. However, there are few studies on improving the quality of autoregressive translation (AT) using non-autoregressive translation (NAT). In this work, we propose a novel Encoder-NAD-AD framework for NMT, aiming at boosting AT with global information produced by NAT model. Specifically, under the semantic guidance of source-side context captured by the encoder, the non-autoregressive decoder (NAD) first learns to generate target-side hidden state sequence in parallel. Then the autoregressive decoder (AD) performs translation from left to right, conditioned on source-side and target-side hidden states. Since AD has global information generated by low-latency NAD, it is more likely to produce a better translation with less time delay. Experiments on WMT14 En-De, WMT16 En-Ro, and IWSLT14 De-En translation tasks demonstrate that our framework achieves significant improvements with only 8% speed degeneration over the autoregressive NMT.
Anthology ID:
2020.autosimtrans-1.4
Volume:
Proceedings of the First Workshop on Automatic Simultaneous Translation
Month:
July
Year:
2020
Address:
Seattle, Washington
Venues:
ACL | AutoSimTrans | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24–29
Language:
URL:
https://www.aclweb.org/anthology/2020.autosimtrans-1.4
DOI:
10.18653/v1/2020.autosimtrans-1.4
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.autosimtrans-1.4.pdf
Video:
 http://slideslive.com/38929920