The ISO Standard for Dialogue Act Annotation, Second Edition
Harry Bunt | Volha Petukhova | Emer Gilmartin | Catherine Pelachaud | Alex Fang | Simon Keizer | Laurent Prévot
Proceedings of the 12th Language Resources and Evaluation Conference

ISO standard 24617-2 for dialogue act annotation, established in 2012, has in the past few years been used both in corpus annotation and in the design of components for spoken and multimodal dialogue systems. This has brought some inaccuracies and undesirbale limitations of the standard to light, which are addressed in a proposed second edition. This second edition allows a more accurate annotation of dependence relations and rhetorical relations in dialogue. Following the ISO 24617-4 principles of semantic annotation, and borrowing ideas from EmotionML, a triple-layered plug-in mechanism is introduced which allows dialogue act descriptions to be enriched with information about their semantic content, about accompanying emotions, and other information, and allows the annotation scheme to be customised by adding application-specific dialogue act types.

Multimodal Corpus of Bidirectional Conversation of Human-human and Human-robot Interaction during fMRI Scanning
Birgit Rauchbauer | Youssef Hmamouche | Brigitte Bigi | Laurent Prévot | Magalie Ochs | Thierry Chaminade
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper we present investigation of real-life, bi-directional conversations. We introduce the multimodal corpus derived from these natural conversations alternating between human-human and human-robot interactions. The human-robot interactions were used as a control condition for the social nature of the human-human conversations. The experimental set up consisted of conversations between the participant in a functional magnetic resonance imaging (fMRI) scanner and a human confederate or conversational robot outside the scanner room, connected via bidirectional audio and unidirectional videoconferencing (from the outside to inside the scanner). A cover story provided a framework for natural, real-life conversations about images of an advertisement campaign. During the conversations we collected a multimodal corpus for a comprehensive characterization of bi-directional conversations. In this paper we introduce this multimodal corpus which includes neural data from functional magnetic resonance imaging (fMRI), physiological data (blood flow pulse and respiration), transcribed conversational data, as well as face and eye-tracking recordings. Thus, we present a unique corpus to study human conversations including neural, physiological and behavioral data.

BrainPredict: a Tool for Predicting and Visualising Local Brain Activity
Youssef Hmamouche | Laurent Prévot | Magalie Ochs | Thierry Chaminade
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we present a tool allowing dynamic prediction and visualization of an individual’s local brain activity during a conversation. The prediction module of this tool is based on classifiers trained using a corpus of human-human and human-robot conversations including fMRI recordings. More precisely, the module takes as input behavioral features computed from raw data, mainly the participant and the interlocutor speech but also the participant’s visual input and eye movements. The visualisation module shows in real-time the dynamics of brain active areas synchronised with the behavioral raw data. In addition, it shows which integrated behavioral features are used to predict the activity in individual brain areas.

Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Emmanuele Chersoni | Cassandra Jacobs | Yohei Oseki | Laurent Prévot | Enrico Santus
Filtering conversations through dialogue acts labels for improving corpus-based convergence studies
Simone Fuscone | Benoit Favre | Laurent Prévot
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Cognitive models of conversation and research on user-adaptation in dialogue systems involves a better understanding of speakers convergence in conversation. Convergence effects have been established on controlled data sets, for various acoustic and linguistic variables. Tracking interpersonal dynamics on generic corpora has provided positive but more contrasted outcomes. We propose here to enrich large conversational corpora with dialogue act (DA) information. We use DA-labels as filters in order to create data sub sets featuring homogeneous conversational activity. Those data sets allow a more precise comparison between speakers’ speech variables. Our experiences consist of comparing convergence on low level variables (Energy, Pitch, Speech Rate) measured on raw data sets, with human and automatically DA-labelled data sets. We found that such filtering does help in observing convergence suggesting that studies on interpersonal dynamics should consider such high level dialogue activity types and their related NLP topics as important ingredients of their toolboxes.


Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Emmanuele Chersoni | Cassandra Jacobs | Alessandro Lenci | Tal Linzen | Laurent Prévot | Enrico Santus
Downward Compatible Revision of Dialogue Annotation
Harry Bunt | Emer Gilmartin | Simon Keizer | Catherine Pelachaud | Volha Petukhova | Laurent Prévot | Mariët Theune
Proceedings 14th Joint ACL - ISO Workshop on Interoperable Semantic Annotation


LexFr: Adapting the LexIt Framework to Build a Corpus-based French Subcategorization Lexicon
Giulia Rambelli | Gianluca Lebani | Laurent Prévot | Alessandro Lenci
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper introduces LexFr, a corpus-based French lexical resource built by adapting the framework LexIt, originally developed to describe the combinatorial potential of Italian predicates. As in the original framework, the behavior of a group of target predicates is characterized by a series of syntactic (i.e., subcategorization frames) and semantic (i.e., selectional preferences) statistical information (a.k.a. distributional profiles) whose extraction process is mostly unsupervised. The first release of LexFr includes information for 2,493 verbs, 7,939 nouns and 2,628 adjectives. In these pages we describe the adaptation process and evaluated the final resource by comparing the information collected for 20 test verbs against the information available in a gold standard dictionary. In the best performing setting, we obtained 0.74 precision, 0.66 recall and 0.70 F-measure.

4Couv: A New Treebank for French
Philippe Blache | Grégoire de Montcheuil | Laurent Prévot | Stéphane Rauzy
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The question of the type of text used as primary data in treebanks is of certain importance. First, it has an influence at the discourse level: an article is not organized in the same way as a novel or a technical document. Moreover, it also has consequences in terms of semantic interpretation: some types of texts can be easier to interpret than others. We present in this paper a new type of treebank which presents the particularity to answer to specific needs of experimental linguistic. It is made of short texts (book backcovers) that presents a strong coherence in their organization and can be rapidly interpreted. This type of text is adapted to short reading sessions, making it easy to acquire physiological data (e.g. eye movement, electroencepholagraphy). Such a resource offers reliable data when looking for correlations between computational models and human language processing.

A CUP of CoFee: A large Collection of feedback Utterances Provided with communicative function annotations
Laurent Prévot | Jan Gorisch | Roxane Bertrand
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

There have been several attempts to annotate communicative functions to utterances of verbal feedback in English previously. Here, we suggest an annotation scheme for verbal and non-verbal feedback utterances in French including the categories base, attitude, previous and visual. The data comprises conversations, maptasks and negotiations from which we extracted ca. 13,000 candidate feedback utterances and gestures. 12 students were recruited for the annotation campaign of ca. 9,500 instances. Each instance was annotated by between 2 and 7 raters. The evaluation of the annotation agreement resulted in an average best-pair kappa of 0.6. While the base category with the values acknowledgement, evaluation, answer, elicit achieve good agreement, this is not the case for the other main categories. The data sets, which also include automatic extractions of lexical, positional and acoustic features, are freely available and will further be used for machine learning classification experiments to analyse the form-function relationship of feedback.


A SIP of CoFee : A Sample of Interesting Productions of Conversational Feedback
Laurent Prévot | Jan Gorisch | Roxane Bertrand | Emilien Gorène | Brigitte Bigi
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Annotation and Classification of French Feedback Communicative Functions
Laurent Prévot | Jan Gorisch | Sankar Mukherjee
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation


Representing Multimodal Linguistic Annotated data
Brigitte Bigi | Tatsuya Watanabe | Laurent Prévot
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The question of interoperability for linguistic annotated resources covers different aspects. First, it requires a representation framework making it possible to compare, and eventually merge, different annotation schema. In this paper, a general description level representing the multimodal linguistic annotations is proposed. It focuses on time representation and on the data content representation: This paper reconsiders and enhances the current and generalized representation of annotations. An XML schema of such annotations is proposed. A Python API is also proposed. This framework is implemented in a multi-platform software and distributed under the terms of the GNU Public License.

Aix Map Task corpus: The French multimodal corpus of task-oriented dialogue
Jan Gorisch | Corine Astésano | Ellen Gurman Bard | Brigitte Bigi | Laurent Prévot
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper introduces the Aix Map Task corpus, a corpus of audio and video recordings of task-oriented dialogues. It was modelled after the original HCRC Map Task corpus. Lexical material was designed for the analysis of speech and prosody, as described in Astésano et al. (2007). The design of the lexical material, the protocol and some basic quantitative features of the existing corpus are presented. The corpus was collected under two communicative conditions, one audio-only condition and one face-to-face condition. The recordings took place in a studio and a sound attenuated booth respectively, with head-set microphones (and in the face-to-face condition with two video cameras). The recordings have been segmented into Inter-Pausal-Units and transcribed using transcription conventions containing actual productions and canonical forms of what was said. It is made publicly available online.

Segmentation evaluation metrics, a comparison grounded on prosodic and discourse units
Klim Peshkov | Laurent Prévot
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Knowledge on evaluation metrics and best practices of using them have improved fast in the recent years Fort et al. (2012). However, the advances concern mostly evaluation of classification related tasks. Segmentation tasks have received less attention. Nevertheless, there are crucial in a large number of linguistic studies. A range of metrics is available (F-score on boundaries, F-score on units, WindowDiff ((WD), Boundary Similarity (BS) but it is still relatively difficult to interpret these metrics on various linguistic segmentation tasks, such as prosodic and discourse segmentation. In this paper, we consider real segmented datasets (introduced in Peshkov et al. (2012)) as references which we deteriorate in different ways (random addition of boundaries, random removal boundaries, near-miss errors introduction). This provide us with various measures on controlled datasets and with an interesting benchmark for various linguistic segmentation tasks.


Observing Features of PTT Neologisms: A Corpus-driven Study with N-gram Model
Tsun-Jui Liu | Shu-Kai Hsieh | Laurent Prevot
Proceedings of the 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013)

A quantitative view of feedback lexical markers in conversational French
Laurent Prévot | Brigitte Bigi | Roxane Bertrand
Proceedings of the SIGDIAL 2013 Conference

A Quantitative Comparative Study of Prosodic and Discourse Units, the Case of French and Taiwan Mandarin
Laurent Prévot | Shu-Chuan Tseng | Alvin Cheng-Hsien Chen | Klim Peshkov
Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC 27)


An empirical resource for discovering cognitive principles of discourse organisation: the ANNODIS corpus
Stergos Afantenos | Nicholas Asher | Farah Benamara | Myriam Bras | Cécile Fabre | Mai Ho-dac | Anne Le Draoulec | Philippe Muller | Marie-Paule Péry-Woodley | Laurent Prévot | Josette Rebeyrolles | Ludovic Tanguy | Marianne Vergez-Couret | Laure Vieu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes the ANNODIS resource, a discourse-level annotated corpus for French. The corpus combines two perspectives on discourse: a bottom-up approach and a top-down approach. The bottom-up view incrementally builds a structure from elementary discourse units, while the top-down view focuses on the selective annotation of multi-level discourse structures. The corpus is composed of texts that are diversified with respect to genre, length and type of discursive organisation. The methodology followed here involves an iterative design of annotation guidelines in order to reach satisfactory inter-annotator agreement levels. This allows us to raise a few issues relevant for the comparison of such complex objects as discourse structures. The corpus also serves as a source of empirical evidence for discourse theories. We present here two first analyses taking advantage of this new annotated corpus --one that tested hypotheses on constraints governing discourse structure, and another that studied the variations in composition and signalling of multi-level discourse structures.


Computational Modeling of Verb Acquisition, from a Monolingual to a Bilingual Study
Laurent Prévot | Chun-Han Chang | Yann Desalle
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

A Formal Scheme for Multimodal Grammars
Philippe Blache | Laurent Prévot
Coling 2010: Posters

The OTIM Formal Annotation Model: A Preliminary Step before Annotation Scheme
Philippe Blache | Roxane Bertrand | Mathilde Guardiola | Marie-Laure Guénot | Christine Meunier | Irina Nesterenko | Berthille Pallaud | Laurent Prévot | Béatrice Priego-Valverde | Stéphane Rauzy
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Large annotation projects, typically those addressing the question of multimodal annotation in which many different kinds of information have to be encoded, have to elaborate precise and high level annotation schemes. Doing this requires first to define the structure of the information: the different objects and their organization. This stage has to be as much independent as possible from the coding language constraints. This is the reason why we propose a preliminary formal annotation model, represented with typed feature structures. This representation requires a precise definition of the different objects, their properties (or features) and their relations, represented in terms of type hierarchies. This approach has been used to specify the annotation scheme of a large multimodal annotation project (OTIM) and experimented in the annotation of a multimodal corpus (CID, Corpus of Interactional Data). This project aims at collecting, annotating and exploiting a dialogue video corpus in a multimodal perspective (including speech and gesture modalities). The corpus itself, is made of 8 hours of dialogues, fully transcribed and richly annotated (phonetics, syntax, pragmatics, gestures, etc.).


Using Extra-Linguistic Material for Mandarin-French Verbal Constructions Comparison
Pierre Magistry | Laurent Prévot | Hintat Cheung | Chien-yun Shiao | Yann Desalle | Bruno Gaume
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

Wiktionary for Natural Language Processing: Methodology and Limitations
Emmanuel Navarro | Franck Sajous | Bruno Gaume | Laurent Prévot | ShuKai Hsieh | Ivy Kuo | Pierre Magistry | Chu-Ren Huang
Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (People’s Web)


Extracting Concrete Senses of Lexicon through Measurement of Conceptual Similarity in Ontologies
Siaw-Fong Chung | Laurent Prévot | Mingwei Xu | Kathleen Ahrens | Shu-Kai Hsieh | Chu-Ren Huang
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The measurement of conceptual similarity in a hierarchical structure has been proposed by studies such as Wu and Palmer (1994) which have been summarized and evaluated in Budanisky and Hirst (2006). The present study applies the measurement of conceptual similarity to conceptual metaphor research by comparing concreteness of ontological resource nodes to several prototypical concrete nodes selected by human subjects. Here, the purpose of comparing conceptual similarity between nodes is to select a concrete sense for a word which is used metaphorically. Through using WordNet-SUMO interface such as SinicaBow (Huang, Chang and Lee, 2004), concrete senses of a lexicon will be selected once its SUMO nodes have been compared in terms of conceptual similarity with the prototypical concrete nodes. This study has strong implications for the interaction of psycholinguistic and computational linguistic fields in conceptual metaphor research.

Toward a cognitive organization for electronic dictionaries, the case for semantic proxemy
Bruno Gaume | Karine Duvignau | Laurent Prévot | Yann Desalle
Coling 2008: Proceedings of the Workshop on Cognitive Aspects of the Lexicon (COGALEX 2008)


Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification
Chu-Ren Huang | Petr Šimon | Shu-Kai Hsieh | Laurent Prévot
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions


Infrastructure for Standardization of Asian Language Resources
Takenobu Tokunaga | Virach Sornlertlamvanich | Thatsanee Charoenporn | Nicoletta Calzolari | Monica Monachini | Claudia Soria | Chu-Ren Huang | YingJu Xia | Hao Yu | Laurent Prevot | Kiyoaki Shirai
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

Using the Swadesh list for creating a simple common taxonomy
Laurent Prévot | Chu-Ren Huang | I-Li Su
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation


Interfacing Ontologies and Lexical Resources
Laurent Prevot | Stefano Borgo | Alessandro Oltramari
Proceedings of OntoLex 2005 - Ontologies and Lexical Resources