Nathalie Friburger


2015

pdf bib
Arabic Named Entity Recognition Process using Transducer Cascade and Arabic Wikipedia
Fatma Ben Mesmia | Kais Haddar | Denis Maurel | Nathalie Friburger
Proceedings of the International Conference Recent Advances in Natural Language Processing

2013

pdf bib
Mining Partial Annotation Rules for Named Entity Recognition (Fouille de règles d’annotation partielles pour la reconnaissance des entités nommées) [in French]
Damien Nouvel | Jean-Yves Antoine | Nathalie Friburger | Arnaud Soulet
Proceedings of TALN 2013 (Volume 1: Long Papers)

pdf bib
Supervised learning on encyclopaedic resources for the extension of a lexicon of proper names dedicated to the recognition of named entities (Apprentissage supervisé sur ressources encyclopédiques pour l’enrichissement d’un lexique de noms propres destiné à la reconnaissance des entités nommées) [in French]
Nadia Okinina | Damien Nouvel | Nathalie Friburger | Jean-Yves Antoine
Proceedings of TALN 2013 (Volume 2: Short Papers)

pdf bib
CasSys, a free transducer cascade system (CasSys Un système libre de cascades de transducteurs) [in French]
Denis Maurel | Nathalie Friburger
Proceedings of TALN 2013 (Volume 3: System Demonstrations)

2012

pdf bib
Coupling Knowledge-Based and Data-Driven Systems for Named Entity Recognition
Damien Nouvel | Jean-Yves Antoine | Nathalie Friburger | Arnaud Soulet
Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data

2010

pdf bib
Eslo: From Transcription to Speakers’ Personal Information Annotation
Iris Eshkol | Denis Maurel | Nathalie Friburger
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents the preliminary works to put online a French oral corpus and its transcription. This corpus is the Socio-Linguistic Survey in Orleans, realized in 1968. First, we numerized the corpus, then we handwritten transcribed it with the Transcriber software adding different tags about speakers, time, noise, etc. Each document (audio file and XML file of the transcription) was described by a set of metadata stored in an XML format to allow an easy consultation. Second, we added different levels of annotations, recognition of named entities and annotation of personal information about speakers. This two annotation tasks used the CasSys system of transducer cascades. We used and modified a first cascade to recognize named entities. Then we built a second cascade to annote the designating entities, i.e. information about the speaker. These second cascade parsed the named entity annotated corpus. The objective is to locate information about the speaker and, also, what kind of information can designate him/her. These two cascades was evaluated with precision and recall measures.

pdf bib
An Analysis of the Performances of the CasEN Named Entities Recognition System in the Ester2 Evaluation Campaign
Damien Nouvel | Jean-Yves Antoine | Nathalie Friburger | Denis Maurel
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present a detailed and critical analysis of the behaviour of the CasEN named entity recognition system during the French Ester2 evaluation campaign. In this project, CasEN has been confronted with the task of detecting and categorizing named entities in manual and automatic transcriptions of radio broadcastings. At first, we give a general presentation of the Ester2 campaign. Then, we describe our system, based on transducers. Next, we depict how systems were evaluated during this campaign and we report the main official results. Afterwards, we investigate in details the influence of some annotation biases which have significantly affected the estimation of the performances of systems. At last, we conduct an in-depth analysis of the effective errors of the CasEN system, providing us with some useful indications about phenomena that gave rise to errors (e.g. metonymy, encapsulation, detection of right boundaries) and are as many challenges for named entity recognition systems.

2008

pdf bib
Automatic Rich Annotation of Large Corpus of Conversational transcribed speech: the Chunking Task of the EPAC Project
Jean-Yves Antoine | Abdenour Mokrane | Nathalie Friburger
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes the use of the CasSys platform in order to achieve the chunking of conversational speech transcripts by means of cascades of Unitex transducers. Our system is involved in the EPAC project of the French National agency of Research (ANR). The aim of this project is to develop robust methods for the annotation of audio/multimedia document collections which contains conversational speech sequences such as TV or radio programs. At first, this paper presents the EPAC project and the adaptation of a former chunking system (Romus) which was developed in the restricted framework of dedicated spoken man-machine dialogue. Then, it describes the problems that are arising due to 1) spontaneous speech disfluencies and 2) errors for the previous stages of processing (automatic speech recognition and POS tagging).