The OPUS Resource Repository: An Open Package for Creating Parallel Corpora and Machine Translation Services

Mikko Aulamo, Jörg Tiedemann


Abstract
This paper presents a flexible and powerful system for creating parallel corpora and for running neural machine translation services. Our package provides a scalable data repository backend that offers transparent data pre-processing pipelines and automatic alignment procedures that facilitate the compilation of extensive parallel data sets from a variety of sources. Moreover, we develop a web-based interface that constitutes an intuitive frontend for end-users of the platform. The whole system can easily be distributed over virtual machines and implements a sophisticated permission system with secure connections and a flexible database for storing arbitrary metadata. Furthermore, we also provide an interface for neural machine translation that can run as a service on virtual machines, which also incorporates a connection to the data repository software.
Anthology ID:
W19-6146
Volume:
Proceedings of the 22nd Nordic Conference on Computational Linguistics
Month:
September–October
Year:
2019
Address:
Turku, Finland
Venues:
NoDaLiDa | WS
SIG:
Publisher:
Linköping University Electronic Press
Note:
Pages:
389–394
Language:
URL:
https://www.aclweb.org/anthology/W19-6146
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W19-6146.pdf