Jan Rosendahl


2019

pdf bib
Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron
Yunsu Kim | Hendrik Rosendahl | Nick Rossenbach | Jan Rosendahl | Shahram Khadivi | Hermann Ney
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation. We train a multilayer perceptron on top of the sentence embeddings to extract good bilingual sentence pairs from nonparallel or noisy parallel data. Our approach shows promising performance on sentence alignment recovery and the WMT 2018 parallel corpus filtering tasks with only a single model.

pdf bib
The RWTH Aachen University Machine Translation Systems for WMT 2019
Jan Rosendahl | Christian Herold | Yunsu Kim | Miguel Graça | Weiyue Wang | Parnia Bahar | Yingbo Gao | Hermann Ney
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes the neural machine translation systems developed at the RWTH Aachen University for the German-English, Chinese-English and Kazakh-English news translation tasks of the Fourth Conference on Machine Translation (WMT19). For all tasks, the final submitted system is based on the Transformer architecture. We focus on improving data filtering and fine-tuning as well as systematically evaluating interesting approaches like unigram language model segmentation and transfer learning. For the De-En task, none of the tested methods gave a significant improvement over last years winning system and we end up with the same performance, resulting in 39.6% BLEU on newstest2019. In the Zh-En task, we show 1.3% BLEU improvement over our last year’s submission, which we mostly attribute to the splitting of long sentences during translation. We further report results on the Kazakh-English task where we gain improvements of 11.1% BLEU over our baseline system. On the same task we present a recent transfer learning approach, which uses half of the free parameters of our submission system and performs on par with it.

2018

pdf bib
The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018
Julian Schamper | Jan Rosendahl | Parnia Bahar | Yunsu Kim | Arne Nix | Hermann Ney
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the statistical machine translation systems developed at RWTH Aachen University for the German→English, English→Turkish and Chinese→English translation tasks of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use ensembles of neural machine translation systems based on the Transformer architecture. Our main focus is on the German→English task where we to all automatic scored first with respect metrics provided by the organizers. We identify data selection, fine-tuning, batch size and model dimension as important hyperparameters. In total we improve by 6.8% BLEU over our last year’s submission and by 4.8% BLEU over the winning system of the 2017 German→English task. In English→Turkish task, we show 3.6% BLEU improvement over the last year’s winning system. We further report results on the Chinese→English task where we improve 2.2% BLEU on average over our baseline systems but stay behind the 2018 winning systems.

pdf bib
The RWTH Aachen University Filtering System for the WMT 2018 Parallel Corpus Filtering Task
Nick Rossenbach | Jan Rosendahl | Yunsu Kim | Miguel Graça | Aman Gokrani | Hermann Ney
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the submission of RWTH Aachen University for the De→En parallel corpus filtering task of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use several rule-based, heuristic methods to preselect sentence pairs. These sentence pairs are scored with count-based and neural systems as language and translation models. In addition to single sentence-pair scoring, we further implement a simple redundancy removing heuristic. Our best performing corpus filtering system relies on recurrent neural language models and translation models based on the transformer architecture. A model trained on 10M randomly sampled tokens reaches a performance of 9.2% BLEU on newstest2018. Using our filtering and ranking techniques we achieve 34.8% BLEU.

2017

pdf bib
The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017
Jan-Thorsten Peter | Andreas Guta | Tamer Alkhouli | Parnia Bahar | Jan Rosendahl | Nick Rossenbach | Miguel Graça | Hermann Ney
Proceedings of the Second Conference on Machine Translation