Louise Deléger


pdf bib
Handling Entity Normalization with no Annotated Corpus: Weakly Supervised Methods Based on Distributional Representation and Ontological Information
Arnaud Ferré | Robert Bossy | Mouhamadou Ba | Louise Deléger | Thomas Lavergne | Pierre Zweigenbaum | Claire Nédellec
Proceedings of the 12th Language Resources and Evaluation Conference

Entity normalization (or entity linking) is an important subtask of information extraction that links entity mentions in text to categories or concepts in a reference vocabulary. Machine learning based normalization methods have good adaptability as long as they have enough training data per reference with a sufficient quality. Distributional representations are commonly used because of their capacity to handle different expressions with similar meanings. However, in specific technical and scientific domains, the small amount of training data and the relatively small size of specialized corpora remain major challenges. Recently, the machine learning-based CONTES method has addressed these challenges for reference vocabularies that are ontologies, as is often the case in life sciences and biomedical domains. And yet, its performance is dependent on manually annotated corpus. Furthermore, like other machine learning based methods, parametrization remains tricky. We propose a new approach to address the scarcity of training data that extends the CONTES method by corpus selection, pre-processing and weak supervision strategies, which can yield high-performance results without any manually annotated examples. We also study which hyperparameters are most influential, with sometimes different patterns compared to previous work. The results show that our approach significantly improves accuracy and outperforms previous state-of-the-art algorithms.


pdf bib
Bacteria Biotope at BioNLP Open Shared Tasks 2019
Robert Bossy | Louise Deléger | Estelle Chaix | Mouhamadou Ba | Claire Nédellec
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks

This paper presents the fourth edition of the Bacteria Biotope task at BioNLP Open Shared Tasks 2019. The task focuses on the extraction of the locations and phenotypes of microorganisms from PubMed abstracts and full-text excerpts, and the characterization of these entities with respect to reference knowledge sources (NCBI taxonomy, OntoBiotope ontology). The task is motivated by the importance of the knowledge on biodiversity for fundamental research and applications in microbiology. The paper describes the different proposed subtasks, the corpus characteristics, and the challenge organization. We also provide an analysis of the results obtained by participants, and inspect the evolution of the results since the last edition in 2016.

pdf bib
Participation de l’équipe LAI à DEFT 2019 (Participation of team LAI in the DEFT 2019 challenge )
Jacques Hilbey | Louise Deléger | Xavier Tannier
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Défi Fouille de Textes (atelier TALN-RECITAL)

Nous présentons dans cet article les méthodes conçues et les résultats obtenus lors de notre participation à la tâche 3 de la campagne d’évaluation DEFT 2019. Nous avons utilisé des approches simples à base de règles ou d’apprentissage automatique, et si nos résultats sont très bons sur les informations simples à extraire comme l’âge et le sexe du patient, ils restent mitigés sur les tâches plus difficiles.


pdf bib
Combining rule-based and embedding-based approaches to normalize textual entities with an ontology
Arnaud Ferré | Louise Deléger | Pierre Zweigenbaum | Claire Nédellec
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


pdf bib
Overview of the Regulatory Network of Plant Seed Development (SeeDev) Task at the BioNLP Shared Task 2016.
Estelle Chaix | Bertrand Dubreucq | Abdelhak Fatihi | Dialekti Valsamou | Robert Bossy | Mouhamadou Ba | Louise Deléger | Pierre Zweigenbaum | Philippe Bessières | Loic Lepiniec | Claire Nédellec
Proceedings of the 4th BioNLP Shared Task Workshop

pdf bib
Overview of the Bacteria Biotope Task at BioNLP Shared Task 2016
Louise Deléger | Robert Bossy | Estelle Chaix | Mouhamadou Ba | Arnaud Ferré | Philippe Bessières | Claire Nédellec
Proceedings of the 4th BioNLP Shared Task Workshop


pdf bib
Automatic identification of document sections for designing a French clinical corpus (Identification automatique de zones dans des documents pour la constitution d’un corpus médical en français) [in French]
Louise Deléger | Aurélie Névéol
Proceedings of TALN 2014 (Volume 2: Short Papers)

pdf bib
Annotation of specialized corpora using a comprehensive entity and relation scheme
Louise Deléger | Anne-Laure Ligozat | Cyril Grouin | Pierre Zweigenbaum | Aurélie Névéol
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Annotated corpora are essential resources for many applications in Natural Language Processing. They provide insight on the linguistic and semantic characteristics of the genre and domain covered, and can be used for the training and evaluation of automatic tools. In the biomedical domain, annotated corpora of English texts have become available for several genres and subfields. However, very few similar resources are available for languages other than English. In this paper we present an effort to produce a high-quality corpus of clinical documents in French, annotated with a comprehensive scheme of entities and relations. We present the annotation scheme as well as the results of a pilot annotation study covering 35 clinical documents in a variety of subfields and genres. We show that high inter-annotator agreement can be achieved using a complex annotation scheme.


pdf bib
Named and Specific Entity Detection in Varied Data: The Quæro Named Entity Baseline Evaluation
Olivier Galibert | Ludovic Quintard | Sophie Rosset | Pierre Zweigenbaum | Claire Nédellec | Sophie Aubin | Laurent Gillard | Jean-Pierre Raysz | Delphine Pois | Xavier Tannier | Louise Deléger | Dominique Laurent
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The Quæro program that promotes research and industrial innovation on technologies for automatic analysis and classification of multimedia and multilingual documents. Within its context a set of evaluations of Named Entity recognition systems was held in 2009. Four tasks were defined. The first two concerned traditional named entities in French broadcast news for one (a rerun of ESTER 2) and of OCR-ed old newspapers for the other. The third was a gene and protein name extraction in medical abstracts. The last one was the detection of references in patents. Four different partners participated, giving a total of 16 systems. We provide a synthetic descriptions of all of them classifying them by the main approaches chosen (resource-based, rules-based or statistical), without forgetting the fact that any modern system is at some point hybrid. The metric (the relatively standard Slot Error Rate) and the results are also presented and discussed. Finally, a process is ongoing with preliminary acceptance of the partners to ensure the availability for the community of all the corpora used with the exception of the non-Quæro produced ESTER 2 one.

pdf bib
Identifying Paraphrases between Technical and Lay Corpora
Louise Deléger | Pierre Zweigenbaum
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In previous work, we presented a preliminary study to identify paraphrases between technical and lay discourse types from medical corpora dedicated to the French language. In this paper, we test the hypothesis that the same kinds of paraphrases as for French can be detected between English technical and lay discourse types and report the adaptation of our method from French to English. Starting from the constitution of monolingual comparable corpora, we extract two kinds of paraphrases: paraphrases between nominalizations and verbal constructions and paraphrases between neo-classical compounds and modern-language phrases. We do this relying on morphological resources and a set of extraction rules we adapt from the original approach for French. Results show that paraphrases could be identified with a rather good precision, and that these types of paraphrase are relevant in the context of the opposition between technical and lay discourse types. These observations are consistent with the results obtained for French, which demonstrates the portability of the approach as well as the similarity of the two languages as regards the use of those kinds of expressions in technical and lay discourse types.


pdf bib
Extracting Lay Paraphrases of Specialized Expressions from Monolingual Comparable Medical Corpora
Louise Deléger | Pierre Zweigenbaum
Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora (BUCC)