Lorraine Goeuriot


pdf bib
Building Evaluation Datasets for Cultural Microblog Retrieval
Lorraine Goeuriot | Josiane Mothe | Philippe Mulhem | Eric SanJuan
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


pdf bib
Building Evaluation Datasets for Consumer-Oriented Information Retrieval
Lorraine Goeuriot | Liadh Kelly | Guido Zuccon | Joao Palotti
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Common people often experience difficulties in accessing relevant, correct, accurate and understandable health information online. Developing search techniques that aid these information needs is challenging. In this paper we present the datasets created by CLEF eHealth Lab from 2013-2015 for evaluation of search solutions to support common people finding health information online. Specifically, the CLEF eHealth information retrieval (IR) task of this Lab has provided the research community with benchmarks for evaluating consumer-centered health information retrieval, thus fostering research and development aimed to address this challenging problem. Given consumer queries, the goal of the task is to retrieve relevant documents from the provided collection of web pages. The shared datasets provide a large health web crawl, queries representing people’s real world information needs, and relevance assessment judgements for the queries.


pdf bib
Porting a Summarizer to the French Language
Rémi Bois | Johannes Leveling | Lorraine Goeuriot | Gareth J. F. Jones | Liadh Kelly
Proceedings of TALN 2014 (Volume 2: Short Papers)


pdf bib
Compilation of Specialized Comparable Corpora in French and Japanese
Lorraine Goeuriot | Emmanuel Morin | Béatrice Daille
Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora (BUCC)


pdf bib
Characterization of Scientific and Popular Science Discourse in French, Japanese and Russian
Lorraine Goeuriot | Natalia Grabar | Béatrice Daille
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We aim to characterize the comparability of corpora, we address this issue in the trilingual context through the distinction of expert and non expert documents. We work separately with corpora composed of documents from the medical domain in three languages (French, Japanese and Russian) which present an important linguistic distance between them. In our approach, documents are characterized in each language by their topic and by a discursive typology positioned at three levels of document analysis: structural, modal and lexical. The document typology is implemented with two learning algorithms (SVMlight and C4.5). Evaluation of results shows that the proposed discursive typology can be transposed from one language to another, as it indeed allows to distinguish the two aimed discourses (science and popular science). However, we observe that performances vary a lot according to languages, algorithms and types of discursive characteristics.