Jean-Philippe Goldman


2018

pdf bib
Strategies and Challenges for Crowdsourcing Regional Dialect Perception Data for Swiss German and Swiss French
Jean-Philippe Goldman | Simon Clematide | Mathieu Avanzi | Raphael Tandler
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
MIAPARLE: Online training for the discrimination of stress contrasts
Jean-Philippe Goldman | Sandra Schwab
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Crowdsourcing Regional Variation Data and Automatic Geolocalisation of Speakers of European French
Jean-Philippe Goldman | Yves Scherrer | Julie Glikman | Mathieu Avanzi | Christophe Benzitoun | Philippe Boula de Mareüil
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Cartopho : un site web de cartographie de variantes de prononciation en français (Cartopho: a website for mapping pronunciation variants in French)
Philippe Boula de Mareüil | Jean-Philippe Goldman | Albert Rilliard | Yves Scherrer | Frédéric Vernier
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP

Le présent travail se propose de renouveler les traditionnels atlas dialectologiques pour cartographier les variantes de prononciation en français, à travers un site internet. La toile est utilisée non seulement pour collecter des données, mais encore pour disséminer les résultats auprès des chercheurs et du grand public. La méthodologie utilisée, à base de crowdsourcing (ou « production participative »), nous a permis de recueillir des informations auprès de 2500 francophones d’Europe (France, Belgique, Suisse). Une plateforme dynamique à l’interface conviviale a ensuite été développée pour cartographier la prononciation de 70 mots dans les différentes régions des pays concernés (des mots notamment à voyelle moyenne ou dont la consonne finale peut être prononcée ou non). Les options de visualisation par département/canton/province ou par région, combinant plusieurs traits de prononciation et ensembles de mots, sous forme de pastilles colorées, de hachures, etc. sont présentées dans cet article. On peut ainsi observer immédiatement un /E/ plus fermé (ainsi qu’un /O/ plus ouvert) dans le Nord-Pas-de-Calais et le sud de la France, pour des mots comme parfait ou rose, un /Œ/ plus fermé en Suisse pour un mot comme gueule, par exemple.

2014

pdf bib
A Crowdsourcing Smartphone Application for Swiss German: Putting Language Documentation in the Hands of the Users
Jean-Philippe Goldman | Adrian Leeman | Marie-José Kolly | Ingrid Hove | Ibrahim Almajai | Volker Dellwo | Steven Moran
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This contribution describes an on-going projects a smartphone application called Voice Ãpp, which is a follow-up of a previous application called Dialäkt Ãpp. The main purpose of both apps is to identify the user’s Swiss German dialect on the basis of the dialectal variations of 15 words. The result is returned as one or more geographical points on a map. In Dialäkt Ãpp, launched in 2013, the user provides his or her own pronunciation through buttons, while the Voice Ãpp, currently in development, asks users to pronounce the word and uses speech recognition techniques to identify the variants and localize the user. This second app is more challenging from a technical point of view but nevertheless recovers the nature of dialect variation of spoken language. Besides, the Voice Ãpp takes its users on a journey in which they explore the individuality of their own voices, answering questions such as: How high is my voice? How fast do I speak? Do I speak faster than users in the neighbouring city?

pdf bib
Rhapsodie: a Prosodic-Syntactic Treebank for Spoken French
Anne Lacheret | Sylvain Kahane | Julie Beliao | Anne Dister | Kim Gerdes | Jean-Philippe Goldman | Nicolas Obin | Paola Pietrandrea | Atanas Tchobanov
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The main objective of the Rhapsodie project (ANR Rhapsodie 07 Corp-030-01) was to define rich, explicit, and reproducible schemes for the annotation of prosody and syntax in different genres (± spontaneous, ± planned, face-to-face interviews vs. broadcast, etc.), in order to study the prosody/syntax/discourse interface in spoken French, and their roles in the segmentation of speech into discourse units (Lacheret, Kahane, & Pietrandrea forthcoming). We here describe the deliverable, a syntactic and prosodic treebank of spoken French, composed of 57 short samples of spoken French (5 minutes long on average, amounting to 3 hours of speech and 33000 words), orthographically and phonetically transcribed. The transcriptions and the annotations are all aligned on the speech signal: phonemes, syllables, words, speakers, overlaps. This resource is freely available at www.projet-rhapsodie.fr. The sound samples (wav/mp3), the acoustic analysis (original F0 curve manually corrected and automatic stylized F0, pitch format), the orthographic transcriptions (txt), the microsyntactic annotations (tabular format), the macrosyntactic annotations (txt, tabular format), the prosodic annotations (xml, textgrid, tabular format), and the metadata (xml and html) can be freely downloaded under the terms of the Creative Commons licence Attribution - Noncommercial - Share Alike 3.0 France. The metadata are encoded in the IMDI-CMFI format and can be parsed on line.

pdf bib
C-PhonoGenre: a 7-hours corpus of 7 speaking styles in French: relations between situational features and prosodic properties
Jean-Philippe Goldman | Tea Pršir | Antoine Auchlin
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Phonogenres, or speaking styles, are typified acoustic images associated to types of language activities, causing prosodic and phonostylistic variations. This communication presents a large speech corpus (7 hours) in French, extending a previous work by Goldman et al. (2011a), Simon et al. (2010), with a greater number and complementary repertoire of considered phonogenres. The corpus is available with segmentation at phonetic, syllabic and word levels, as well as manual annotation. Segmentations and annotations were achieved semi-automatically, through a set of Praat implemented tools, and manual steps. The phonogenres are also described with a reduced set of situational dimensions as in Lucci (1983) and Koch & Oesterreicher’s (2001). A preliminary acoustic study, joining rhythmical comparative measurements (Dellwo 2010) to Goldman et al.’s (2007a) ProsoReport, reports acoustic differences between phonogenres.

pdf bib
DisMo: A Morphosyntactic, Disfluency and Multi-Word Unit Annotator. An Evaluation on a Corpus of French Spontaneous and Read Speech
George Christodoulides | Mathieu Avanzi | Jean-Philippe Goldman
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present DisMo, a multi-level annotator for spoken language corpora that integrates part-of-speech tagging with basic disfluency detection and annotation, and multi-word unit recognition. DisMo is a hybrid system that uses a combination of lexical resources, rules, and statistical models based on Conditional Random Fields (CRF). In this paper, we present the first public version of DisMo for French. The system is trained and its performance evaluated on a 57k-token corpus, including different varieties of French spoken in three countries (Belgium, France and Switzerland). DisMo supports a multi-level annotation scheme, in which the tokenisation to minimal word units is complemented with multi-word unit groupings (each having associated POS tags), as well as separate levels for annotating disfluencies and discourse phenomena. We present the system’s architecture, linguistic resources and its hierarchical tag-set. Results show that DisMo achieves a precision of 95% (finest tag-set) to 96.8% (coarse tag-set) in POS-tagging non-punctuated, sound-aligned transcriptions of spoken French, while also offering substantial possibilities for automated multi-level annotation.