The OPUS Corpus - Parallel and Free: http://logos.uio.no/opus

Jörg Tiedemann, Lars Nygaard


Abstract
The OPUS corpus is a growing collection of translated documents collected from the internet. The current version contains about 30 million words in 60 languages. The entire corpus is sentence aligned and it also contains linguistic markup for certain languages.
Anthology ID:
L04-1174
Volume:
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
Month:
May
Year:
2004
Address:
Lisbon, Portugal
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2004/pdf/320.pdf
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://www.lrec-conf.org/proceedings/lrec2004/pdf/320.pdf