Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions

Philipp Koehn, Francisco Guzmán, Vishrav Chaudhary, Juan Pino


Abstract
Following the WMT 2018 Shared Task on Parallel Corpus Filtering, we posed the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting 2% and 10% of the highest-quality data to be used to train machine translation systems. This year, the task tackled the low resource condition of Nepali-English and Sinhala-English. Eleven participants from companies, national research labs, and universities participated in this task.
Anthology ID:
W19-5404
Volume:
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
Month:
August
Year:
2019
Address:
Florence, Italy
Venues:
ACL | WMT | WS
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
54–72
Language:
URL:
https://www.aclweb.org/anthology/W19-5404
DOI:
10.18653/v1/W19-5404
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W19-5404.pdf