On the Relation between Position Information and Sentence Length in Neural Machine Translation

Masato Neishi, Naoki Yoshinaga


Abstract
Long sentences have been one of the major challenges in neural machine translation (NMT). Although some approaches such as the attention mechanism have partially remedied the problem, we found that the current standard NMT model, Transformer, has difficulty in translating long sentences compared to the former standard, Recurrent Neural Network (RNN)-based model. One of the key differences of these NMT models is how the model handles position information which is essential to process sequential data. In this study, we focus on the position information type of NMT models, and hypothesize that relative position is better than absolute position. To examine the hypothesis, we propose RNN-Transformer which replaces positional encoding layer of Transformer by RNN, and then compare RNN-based model and four variants of Transformer. Experiments on ASPEC English-to-Japanese and WMT2014 English-to-German translation tasks demonstrate that relative position helps translating sentences longer than those in the training data. Further experiments on length-controlled training data reveal that absolute position actually causes overfitting to the sentence length.
Anthology ID:
K19-1031
Volume:
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)
Month:
November
Year:
2019
Address:
Hong Kong, China
Venue:
CoNLL
SIG:
SIGNLL
Publisher:
Association for Computational Linguistics
Note:
Pages:
328–338
Language:
URL:
https://www.aclweb.org/anthology/K19-1031
DOI:
10.18653/v1/K19-1031
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/K19-1031.pdf