Marie-Laure Guénot


2015

pdf bib
Création d’un nouveau treebank à partir de quatrièmes de couverture
Philippe Blache | Grégoire Moncheuil | Stéphane Rauzy | Marie-Laure Guénot
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Nous présentons ici 4-couv, un nouveau corpus arboré d’environ 3 500 phrases, constitué d’un ensemble de quatrièmes de couverture, étiqueté et analysé automatiquement puis corrigé et validé à la main. Il répond à des besoins spécifiques pour des projets de linguistique expérimentale, et vise à rester compatible avec les autres treebanks existants pour le français. Nous présentons ici le corpus lui-même ainsi que les outils utilisés pour les différentes étapes de son élaboration : choix des textes, étiquetage, parsing, correction manuelle.

2010

pdf bib
PASSAGE Syntactic Representation: a Minimal Common Ground for Evaluation
Anne Vilnat | Patrick Paroubek | Eric Villemonte de la Clergerie | Gil Francopoulo | Marie-Laure Guénot
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The current PASSAGE syntactic representation is the result of 9 years of constant evolution with the aim of providing a common ground for evaluating parsers of French whatever their type and supporting theory. In this paper we present the latest developments concerning the formalism and show first through a review of basic linguistic phenomena that it is a plausible minimal common ground for representing French syntax in the context of generic black box quantitative objective evaluation. For the phenomena reviewed, which include: the notion of syntactic head, apposition, control and coordination, we explain how PASSAGE representation relates to other syntactic representation schemes for French and English, slightly extending the annotation to address English when needed. Second, we describe the XML format chosen for PASSAGE and show that it is compliant with the latest propositions in terms of linguistic annotation standard. We conclude discussing the influence that corpus-based evaluation has on the characteristics of syntactic representation when willing to assess the performance of any kind of parser.

pdf bib
The OTIM Formal Annotation Model: A Preliminary Step before Annotation Scheme
Philippe Blache | Roxane Bertrand | Mathilde Guardiola | Marie-Laure Guénot | Christine Meunier | Irina Nesterenko | Berthille Pallaud | Laurent Prévot | Béatrice Priego-Valverde | Stéphane Rauzy
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Large annotation projects, typically those addressing the question of multimodal annotation in which many different kinds of information have to be encoded, have to elaborate precise and high level annotation schemes. Doing this requires first to define the structure of the information: the different objects and their organization. This stage has to be as much independent as possible from the coding language constraints. This is the reason why we propose a preliminary formal annotation model, represented with typed feature structures. This representation requires a precise definition of the different objects, their properties (or features) and their relations, represented in terms of type hierarchies. This approach has been used to specify the annotation scheme of a large multimodal annotation project (OTIM) and experimented in the annotation of a multimodal corpus (CID, Corpus of Interactional Data). This project aims at collecting, annotating and exploiting a dialogue video corpus in a multimodal perspective (including speech and gesture modalities). The corpus itself, is made of 8 hours of dialogues, fully transcribed and richly annotated (phonetics, syntax, pragmatics, gestures, etc.).