João Miguel Casteleiro

Also published as: João Casteleiro


pdf bib
Context Sense Clustering for Translation
João Casteleiro | Gabriel Lopes | Joaquim Silva
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation


pdf bib
COMBINA-PT: A Large Corpus-extracted and Hand-checked Lexical Database of Portuguese Multiword Expressions
Amália Mendes | Sandra Antunes | Maria Fernanda Bacelar do Nascimento | João Miguel Casteleiro | Luísa Pereira | Tiago Sá
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents the COMBINA-PT project, a study of corpus-extracted Portuguese Multiword (MW) expressions. The objective of this on-going project is to compile a large lexical database of multiword (MW) units of the Portuguese language, automatically extracted from a balanced 50 million word corpus, and manually validated with the help of lexical association measures. MW expressions considered in the database include named entities and lexical associations with different degrees of cohesion, ranging from frozen groups, which undergo little or no variation, to lexical collocations composed of words that tend to occur together and that constitute syntactic dependencies, although with a low degree of fixedness. This new resource has a two-fold objective: (i) to be an important research tool which supports the development of MW expressions typologies and their lexicographic treatment; (ii) to be of major help in developing and evaluating language processing tools able of dealing with MW expressions.