Benjamin Maza


2012

pdf bib
DECODA: a call-centre human-human spoken conversation corpus
Frederic Bechet | Benjamin Maza | Nicolas Bigouroux | Thierry Bazillon | Marc El-Bèze | Renato De Mori | Eric Arbillot
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The goal of the DECODA project is to reduce the development cost of Speech Analytics systems by reducing the need for manual annotat ion. This project aims to propose robust speech data mining tools in the framework of call-center monitoring and evaluation, by means of weakl y supervised methods. The applicative framework of the project is the call-center of the RATP (Paris public transport authority). This project tackles two very important open issues in the development of speech mining methods from spontaneous speech recorded in call-centers : robus tness (how to extract relevant information from very noisy and spontaneous speech messages) and weak supervision (how to reduce the annotation effort needed to train and adapt recognition and classification models). This paper describes the DECODA corpus collected at the RATP during the project. We present the different annotation levels performed on the corpus, the methods used to obtain them, as well as some evaluation o f the quality of the annotations produced.