European Union Language Resources in Sketch Engine

Vít Baisa, Jan Michelfeit, Marek Medveď, Miloš Jakubíček


Abstract
Several parallel corpora built from European Union language resources are presented here. They were processed by state-of-the-art tools and made available for researchers in the corpus manager Sketch Engine. A completely new resource is introduced: EUR-Lex Corpus, being one of the largest parallel corpus available at the moment, containing 840 million English tokens and the largest language pair English-French has more than 25 million aligned segments (paragraphs).
Anthology ID:
L16-1445
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2799–2803
Language:
URL:
https://www.aclweb.org/anthology/L16-1445
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/L16-1445.pdf