Meaningless yet meaningful: Morphology grounded subword-level NMT

Tamali Banerjee, Pushpak Bhattacharyya


Abstract
We explore the use of two independent subsystems Byte Pair Encoding (BPE) and Morfessor as basic units for subword-level neural machine translation (NMT). We show that, for linguistically distant language-pairs Morfessor-based segmentation algorithm produces significantly better quality translation than BPE. However, for close language-pairs BPE-based subword-NMT may translate better than Morfessor-based subword-NMT. We propose a combined approach of these two segmentation algorithms Morfessor-BPE (M-BPE) which outperforms these two baseline systems in terms of BLEU score. Our results are supported by experiments on three language-pairs: English-Hindi, Bengali-Hindi and English-Bengali.
Anthology ID:
W18-1207
Volume:
Proceedings of the Second Workshop on Subword/Character LEvel Models
Month:
June
Year:
2018
Address:
New Orleans
Venues:
NAACL | SCLeM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
55–60
Language:
URL:
https://www.aclweb.org/anthology/W18-1207
DOI:
10.18653/v1/W18-1207
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W18-1207.pdf