The Little Prince in 26 Languages: Towards a Multilingual Neuro-Cognitive Corpus

Sabrina Stehwien, Lena Henke, John Hale, Jonathan Brennan, Lars Meyer


Abstract
We present the Le Petit Prince Corpus (LPPC), a multi-lingual resource for research in (computational) psycho- and neurolinguistics. The corpus consists of the children’s story The Little Prince in 26 languages. The dataset is in the process of being built using state-of-the-art methods for speech and language processing and electroencephalography (EEG). The planned release of LPPC dataset will include raw text annotated with dependency graphs in the Universal Dependencies standard, a near-natural-sounding synthetic spoken subset as well as EEG recordings. We will use this corpus for conducting neurolinguistic studies that generalize across a wide range of languages, overcoming typological constraints to traditional approaches. The planned release of the LPPC combines linguistic and EEG data for many languages using fully automatic methods, and thus constitutes a readily extendable resource that supports cross-linguistic and cross-disciplinary research.
Anthology ID:
2020.lincr-1.6
Volume:
Proceedings of the Second Workshop on Linguistic and Neurocognitive Resources
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
LREC | LiNCr | WS
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
43–49
Language:
English
URL:
https://www.aclweb.org/anthology/2020.lincr-1.6
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.lincr-1.6.pdf