Anne Garcia-Fernandez


2014

pdf bib
Construction and Annotation of a French Folkstale Corpus
Anne Garcia-Fernandez | Anne-Laure Ligozat | Anne Vilnat
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we present the digitization and annotation of a tales corpus - which is to our knowledge the only French tales corpus available and classified according to the Aarne&Thompson classification - composed of historical texts (with old French parts). We first studied whether the pre-processing tools, namely OCR and PoS-tagging, have good enough accuracies to allow automatic analysis. We also manually annotated this corpus according to several types of information which could prove useful for future work: character references, episodes, and motifs. The contributions are the creation of an corpus of French tales from classical anthropology material, which will be made available to the community; the evaluation of OCR and NLP tools on this corpus; and the annotation with anthropological information.

pdf bib
Evaluation of different strategies for domain adaptation in opinion mining
Anne Garcia-Fernandez | Olivier Ferret | Marco Dinarelli
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The work presented in this article takes place in the field of opinion mining and aims more particularly at finding the polarity of a text by relying on machine learning methods. In this context, it focuses on studying various strategies for adapting a statistical classifier to a new domain when training data only exist for one or several other domains. This study shows more precisely that a self-training procedure consisting in enlarging the initial training corpus with texts from the target domain that were reliably classified by the classifier is the most successful and stable strategy for the tested domains. Moreover, this strategy gets better results in most cases than (Blitzer et al., 2007)’s method on the same evaluation corpus while it is more simple.

2013

pdf bib
Studying frequency-based approaches to process lexical simplification (Approches à base de fréquences pour la simplification lexicale) [in French]
Anne-Laure Ligozat | Cyril Grouin | Anne Garcia-Fernandez | Delphine Bernhard
Proceedings of TALN 2013 (Volume 1: Long Papers)

2012

pdf bib
Etude de différentes stratégies d’adaptation à un nouveau domaine en fouille d’opinion (Study of various strategies for adapting an opinion classifier to a new domain) [in French]
Anne Garcia-Fernandez | Olivier Ferret
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib
ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking
Anne-Laure Ligozat | Cyril Grouin | Anne Garcia-Fernandez | Delphine Bernhard
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2010

pdf bib
MACAQ : A Multi Annotated Corpus to Study how we Adapt Answers to Various Questions
Anne Garcia-Fernandez | Sophie Rosset | Anne Vilnat
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents a corpus of human answers in natural language collected in order to build a base of examples useful when generating natural language answers. We present the corpus and the way we acquired it. Answers correspond to questions with fixed linguistic form, focus, and topic. Answers to a given question exist for two modalities of interaction: oral and written. The whole corpus of answers was annotated manually and automatically on different levels including words from the questions being reused in the answer, the precise element answering the question (or information-answer), and completions. A detailed description of the annotations is presented. Two examples of corpus analyses are described. The first analysis shows some differences between oral and written modality especially in terms of length of the answers. The second analysis concerns the reuse of the question focus in the answers.