Latin-Spanish Neural Machine Translation: from the Bible to Saint Augustine

Eva Martínez Garcia, Álvaro García Tejedor


Abstract
Although there are several sources where to find historical texts, they usually are available in the original language that makes them generally inaccessible. This paper presents the development of state-of-the-art Neural Machine Systems for the low-resourced Latin-Spanish language pair. First, we build a Transformer-based Machine Translation system on the Bible parallel corpus. Then, we build a comparable corpus from Saint Augustine texts and their translations. We use this corpus to study the domain adaptation case from the Bible texts to Saint Augustine’s works. Results show the difficulties of handling a low-resourced language as Latin. First, we noticed the importance of having enough data, since the systems do not achieve high BLEU scores. Regarding domain adaptation, results show how using in-domain data helps systems to achieve a better quality translation. Also, we observed that it is needed a higher amount of data to perform an effective vocabulary extension that includes in-domain vocabulary.
Anthology ID:
2020.lt4hala-1.14
Volume:
Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
LREC | LT4HALA | WS
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
94–99
Language:
English
URL:
https://www.aclweb.org/anthology/2020.lt4hala-1.14
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.lt4hala-1.14.pdf