Antonio Balvet


2014

pdf bib
TALC-sef A Manually-Revised POS-TAgged Literary Corpus in Serbian, English and French
Antonio Balvet | Dejan Stosic | Aleksandra Miletic
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we present a parallel literary corpus for Serbian, English and French, the TALC-sef corpus. The corpus includes a manually-revised pos-tagged reference Serbian corpus of over 150,000 words. The initial objective was to devise a reference parallel corpus in the three languages, both for literary and linguistic studies. The French and English sub-corpora had been pos-tagged from the onset, using TreeTagger (Schmid, 1994), but the corpus lacked, until now, a tagged version of the Serbian sub-corpus. Here, we present the original parallel literary corpus, then we address issues related to pos-tagging a large collection of Serbian text: from the conception of an appropriate tagset for Serbian, to the choice of an automatic pos-tagger adapted to the task, and then to some quantitative and qualitative results. We then move on to a discussion of perspectives in the near future for further annotations of the whole parallel corpus.

2010

pdf bib
Building a Lexicon of French Deverbal Nouns from a Semantically Annotated Corpus
Antonio Balvet | Lucie Barque | Rafael Marín
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents project Nomage, which aims at describing the aspectual properties of deverbal nouns in an empirical way. It is centered on the development of two resources: a semantically annotated corpus of deverbal nouns, and an electronic lexicon. They are both presented in this paper, and emphasize how the semantic annotations of the corpus allow the lexicographic description of deverbal nouns to be validated, in particular their polysemy. Nominalizations have occupied a central place in grammatical analysis, with a focus on morphological and syntactic aspects. More recently, researchers have begun to address a specific issue often neglected before, i.e. the semantics of nominalizations, and its implications for Natural Language Processing applications such as electronic ontologies or Information Retrieval. We focus on precisely this issue in the research project NOMAGE, funded by the French National Research Agency (ANR-07-JCJC-0085-01). In this paper, we present the Nomage corpus and the annotations we make on deverbal nouns (section 2). We then show how we build our lexicon with the semantically annotated corpus and illustrate the kind of generalizations we can make from such data (section 3).

pdf bib
The Creagest Project: a Digitized and Annotated Corpus for French Sign Language (LSF) and Natural Gestural Languages
Antonio Balvet | Cyril Courtin | Dominique Boutet | Christian Cuxac | Ivani Fusellier-Souza | Brigitte Garcia | Marie-Thérèse L’Huillier | Marie-Anne Sallandre
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we discuss the theoretical, sociolinguistic, methodological and technical objectives and issues of the French Creagest Project (2007-2012) in setting up, documenting and annotating a large corpus of adult and child French Sign Language (LSF) and of natural gestural language. The main objective of this ANR-funded research project is to set up a collaborative web-based platform for the study of semiogenesis in LSF (French Sign Language), i.e. the study of emerging structures and signs, be they used by Deaf adult signers, Deaf children, or even by Deaf and hearing subjects in interaction. In section 2, we address theoretical and practical issues, emphasizing the outstanding features of the Creagest Project. In section 3, we deal with methodological issues for data collection. Finally, in section 4, we examine technical aspects of LSF video data editing and corpus annotation, in the perspective of setting up a corpus-based formalized description of LSF.

2009

pdf bib
The NOMAGE Project Annotating the semantic features of French nominalizations (project abstract)
Antonio Balvet | Pauline Haas | Richard Huyghe | Anne Jugnet | Rafael Marín
Proceedings of the Eight International Conference on Computational Semantics