Translation of Biomedical Documents with Focus on Spanish-English

Mirela-Stefania Duma, Wolfgang Menzel


Abstract
For the WMT 2018 shared task of translating documents pertaining to the Biomedical domain, we developed a scoring formula that uses an unsophisticated and effective method of weighting term frequencies and was integrated in a data selection pipeline. The method was applied on five language pairs and it performed best on Portuguese-English, where a BLEU score of 41.84 placed it third out of seven runs submitted by three institutions. In this paper, we describe our method and results with a special focus on Spanish-English where we compare it against a state-of-the-art method. Our contribution to the task lies in introducing a fast, unsupervised method for selecting domain-specific data for training models which obtain good results using only 10% of the general domain data.
Anthology ID:
W18-6444
Volume:
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Month:
October
Year:
2018
Address:
Belgium, Brussels
Venues:
EMNLP | WMT | WS
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
637–643
Language:
URL:
https://www.aclweb.org/anthology/W18-6444
DOI:
10.18653/v1/W18-6444
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W18-6444.pdf