“Cheese!”: a Corpus of Face-to-face French Interactions. A Case Study for Analyzing Smiling and Conversational Humor
Béatrice Priego-Valverde | Brigitte Bigi | Mary Amoyal
Proceedings of the 12th Language Resources and Evaluation Conference

Cheese! is a conversational corpus. It consists of 11 French face-to-face conversations lasting around 15 minutes each. Cheese! is a duplication of an American corpus (ref) in order to conduct a cross-cultural comparison of participants’ smiling behavior in humorous and non-humorous sequences in American English and French conversations. In this article, the methodology used to collect and enrich the corpus is presented: experimental protocol, technical choices, transcription, semi-automatic annotations, manual annotations of smiling and humor. An exploratory study investigating the links between smile and humor is then proposed. Based on the analysis of two interactions, two questions are asked: (1) Does smile frame humor? (2) Does smile has an impact on its success or failure? If the experimental design of Cheese! has been elaborated to study specifically smiles and humor in conversations, the high quality of the dataset obtained, and the methodology used are also replicable and can be applied to analyze many other conversational activities and other multimodal modalities.

PACO: a Corpus to Analyze the Impact of Common Ground in Spontaneous Face-to-Face Interaction
Mary Amoyal | Béatrice Priego-Valverde | Stephane Rauzy
Proceedings of the 12th Language Resources and Evaluation Conference

PAC0 is a French audio-video conversational corpus made of 15 face-to-face dyadic interactions, lasting around 20 min each. This compared corpus has been created in order to explore the impact of the lack of personal common ground (Clark, 1996) on participants collaboration during conversation and specifically on their smile during topic transitions. We have constituted this conversational corpus " PACO” by replicating the experimental protocol of “Cheese!” (Priego-valverde & al.,2018). The only difference that distinguishes these two corpora is the degree of CG of the interlocutors: in Cheese! interlocutors are friends, while in PACO they do not know each other. This experimental protocol allows to analyze how the participants are getting acquainted. This study brings two main contributions. First, the PACO conversational corpus enables to compare the impact of the interlocutors’ common ground. Second, the semi-automatic smile annotation protocol allows to obtain reliable and reproducible smile annotations while reducing the annotation time by a factor 10. Keywords : Common ground, spontaneous interaction, smile, automatic detection.


The OTIM Formal Annotation Model: A Preliminary Step before Annotation Scheme
Philippe Blache | Roxane Bertrand | Mathilde Guardiola | Marie-Laure Guénot | Christine Meunier | Irina Nesterenko | Berthille Pallaud | Laurent Prévot | Béatrice Priego-Valverde | Stéphane Rauzy
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Large annotation projects, typically those addressing the question of multimodal annotation in which many different kinds of information have to be encoded, have to elaborate precise and high level annotation schemes. Doing this requires first to define the structure of the information: the different objects and their organization. This stage has to be as much independent as possible from the coding language constraints. This is the reason why we propose a preliminary formal annotation model, represented with typed feature structures. This representation requires a precise definition of the different objects, their properties (or features) and their relations, represented in terms of type hierarchies. This approach has been used to specify the annotation scheme of a large multimodal annotation project (OTIM) and experimented in the annotation of a multimodal corpus (CID, Corpus of Interactional Data). This project aims at collecting, annotating and exploiting a dialogue video corpus in a multimodal perspective (including speech and gesture modalities). The corpus itself, is made of 8 hours of dialogues, fully transcribed and richly annotated (phonetics, syntax, pragmatics, gestures, etc.).