Luís Miguel Cabral

Also published as: Luís Cabral


2010

pdf bib
GikiCLEF: Crosscultural Issues in Multilingual Information Access
Diana Santos | Luís Miguel Cabral | Corina Forascu | Pamela Forner | Fredric Gey | Katrin Lamm | Thomas Mandl | Petya Osenova | Anselmo Peñas | Álvaro Rodrigo | Julia Schulz | Yvonne Skalban | Erik Tjong Kim Sang
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we describe GikiCLEF, the first evaluation contest that, to our knowledge, was specifically designed to expose and investigate cultural and linguistic issues involved in structured multimedia collections and searching, and which was organized under the scope of CLEF 2009. GikiCLEF evaluated systems that answered hard questions for both human and machine, in ten different Wikipedia collections, namely Bulgarian, Dutch, English, German, Italian, Norwegian (Bokmäl and Nynorsk), Portuguese, Romanian, and Spanish. After a short historical introduction, we present the task, together with its motivation, and discuss how the topics were chosen. Then we provide another description from the point of view of the participants. Before disclosing their results, we introduce the SIGA management system explaining the several tasks which were carried out behind the scenes. We quantify in turn the GIRA resource, offered to the community for training and further evaluating systems with the help of the 50 topics gathered and the solutions identified. We end the paper with a critical discussion of what was learned, advancing possible ways to reuse the data.

2006

pdf bib
Corpógrafo V3 - From Terminological Aid to Semi-automatic Knowledge Engineering
Luís Sarmento | Belinda Maia | Diana Santos | Ana Pinto | Luís Cabral
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we will present Corpógrafo, a mature web-based environment for working with corpora, for terminology extraction, and for ontology development. We will explain Corpógrafo’s workflow and describe the most important information extraction methods used, namely its term extraction, and definition / semantic relations identification procedures. We will describe current Corpógrafo users and present a brief overview of the XML format currently used to export terminology databases. Finally, we present future improvements for this tool.