Designing French Tale Corpora for Entertaining Text To Speech Synthesis

David Doukhan, Sophie Rosset, Albert Rilliard, Christophe d’Alessandro, Martine Adda-Decker


Abstract
Text and speech corpora for training a tale telling robot have been designed, recorded and annotated. The aim of these corpora is to study expressive storytelling behaviour, and to help in designing expressive prosodic and co-verbal variations for the artificial storyteller). A set of 89 children tales in French serves as a basis for this work. The tales annotation principles and scheme are described, together with the corpus description in terms of coverage and inter-annotator agreement. Automatic analysis of a new tale with the help of this corpus and machine learning is discussed. Metrics for evaluation of automatic annotation methods are discussed. A speech corpus of about 1 hour, with 12 tales has been recorded and aligned and annotated. This corpus is used for predicting expressive prosody in children tales, above the level of the sentence.
Anthology ID:
L12-1520
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1003–1010
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/876_Paper.pdf
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/876_Paper.pdf