Jiahui Geng


pdf bib
Improving Unsupervised Word-by-Word Translation with Language Model and Denoising Autoencoder
Yunsu Kim | Jiahui Geng | Hermann Ney
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Unsupervised learning of cross-lingual word embedding offers elegant matching of words across languages, but has fundamental limitations in translating sentences. In this paper, we propose simple yet effective methods to improve word-by-word translation of cross-lingual embeddings, using only monolingual corpora but without any back-translation. We integrate a language model for context-aware search, and use a novel denoising autoencoder to handle reordering. Our system surpasses state-of-the-art unsupervised translation systems without costly iterative training. We also analyze the effect of vocabulary size and denoising type on the translation performance, which provides better understanding of learning the cross-lingual word embedding and its usage in translation.

pdf bib
The RWTH Aachen University English-German and German-English Unsupervised Neural Machine Translation Systems for WMT 2018
Miguel Graça | Yunsu Kim | Julian Schamper | Jiahui Geng | Hermann Ney
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the unsupervised neural machine translation (NMT) systems of the RWTH Aachen University developed for the English ↔ German news translation task of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). Our work is based on iterative back-translation using a shared encoder-decoder NMT model. We extensively compare different vocabulary types, word embedding initialization schemes and optimization methods for our model. We also investigate gating and weight normalization for the word embedding layer.