Hiroyuki Deguchi


pdf bib
Bilingual Subword Segmentation for Neural Machine Translation
Hiroyuki Deguchi | Masao Utiyama | Akihiro Tamura | Takashi Ninomiya | Eiichiro Sumita
Proceedings of the 28th International Conference on Computational Linguistics

This paper proposed a new subword segmentation method for neural machine translation, “Bilingual Subword Segmentation,” which tokenizes sentences to minimize the difference between the number of subword units in a sentence and that of its translation. While existing subword segmentation methods tokenize a sentence without considering its translation, the proposed method tokenizes a sentence by using subword units induced from bilingual sentences; this method could be more favorable to machine translation. Evaluations on WAT Asian Scientific Paper Excerpt Corpus (ASPEC) English-to-Japanese and Japanese-to-English translation tasks and WMT14 English-to-German and German-to-English translation tasks show that our bilingual subword segmentation improves the performance of Transformer neural machine translation (up to +0.81 BLEU).


pdf bib
Dependency-Based Self-Attention for Transformer NMT
Hiroyuki Deguchi | Akihiro Tamura | Takashi Ninomiya
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In this paper, we propose a new Transformer neural machine translation (NMT) model that incorporates dependency relations into self-attention on both source and target sides, dependency-based self-attention. The dependency-based self-attention is trained to attend to the modifiee for each token under constraints based on the dependency relations, inspired by Linguistically-Informed Self-Attention (LISA). While LISA is originally proposed for Transformer encoder for semantic role labeling, this paper extends LISA to Transformer NMT by masking future information on words in the decoder-side dependency-based self-attention. Additionally, our dependency-based self-attention operates at sub-word units created by byte pair encoding. The experiments show that our model improves 1.0 BLEU points over the baseline model on the WAT’18 Asian Scientific Paper Excerpt Corpus Japanese-to-English translation task.