Basilio Calderone


pdf bib
ENGLAWI: From Human- to Machine-Readable Wiktionary
Franck Sajous | Basilio Calderone | Nabil Hathout
Proceedings of the 12th Language Resources and Evaluation Conference

This paper introduces ENGLAWI, a large, versatile, XML-encoded machine-readable dictionary extracted from Wiktionary. ENGLAWI contains 752,769 articles encoding the full body of information included in Wiktionary: simple words, compounds and multiword expressions, lemmas and inflectional paradigms, etymologies, phonemic transcriptions in IPA, definition glosses and usage examples, translations, semantic and morphological relations, spelling variants, etc. It is fully documented, released under a free license and supplied with G-PeTo, a series of scripts allowing easy information extraction from ENGLAWI. Additional resources extracted from ENGLAWI, such as an inflectional lexicon, a lexicon of diatopic variants and the inclusion dates of headwords in Wiktionary’s nomenclature are also provided. The paper describes the content of the resource and illustrates how it can be - and has been - used in previous studies. We finally introduce an ongoing work that computes lexicographic word embeddings from ENGLAWI’s definitions.

pdf bib
Glawinette: a Linguistically Motivated Derivational Description of French Acquired from GLAWI
Nabil Hathout | Franck Sajous | Basilio Calderone | Fiammetta Namer
Proceedings of the 12th Language Resources and Evaluation Conference

Glawinette is a derivational lexicon of French that will be used to feed the Démonette database. It has been created from the GLAWI machine readable dictionary. We collected couples of words from the definitions and the morphological sections of the dictionary and then selected the ones that form regular formal analogies and that instantiate frequent enough formal patterns. The graph structure of the morphological families has then been used to identify for each couple of lexemes derivational patterns that are close to the intuition of the morphologists.


pdf bib
Acquisition and enrichment of morphological and morphosemantic knowledge from the French Wiktionary
Nabil Hathout | Franck Sajous | Basilio Calderone
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing

pdf bib
GLÀFF, a Large Versatile French Lexicon
Nabil Hathout | Franck Sajous | Basilio Calderone
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper introduces GLAFF, a large-scale versatile French lexicon extracted from Wiktionary, the collaborative online dictionary. GLAFF contains, for each entry, inflectional features and phonemic transcriptions. It distinguishes itself from the other available French lexicons by its size, its potential for constant updating and its copylefted license. We explain how we have built GLAFF and compare it to other known resources in terms of coverage and quality of the phonemic transcriptions. We show that its size and quality are strong assets that could allow GLAFF to become a reference lexicon for French NLP and linguistics. Moreover, other derived lexicons can easily be based on GLAFF to satisfy specific needs of various fields such as psycholinguistics.


pdf bib
GLÀFF, a Large Versatile French Lexicon (GLÀFF, un Gros Lexique À tout Faire du Français) [in French]
Franck Sajous | Nabil Hathout | Basilio Calderone
Proceedings of TALN 2013 (Volume 1: Long Papers)


pdf bib
PHACTS about activation-based word similarity effects
Basilio Calderone | Chiara Celata
Proceedings of the Workshop on Computational Models of Language Acquisition and Loss


pdf bib
Learning properties of Noun Phrases: from data to functions
Valeria Quochi | Basilio Calderone
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The paper presents two experiments of unsupervised classification of Italian noun phrases. The goal of the experiments is to identify the most prominent contextual properties that allow for a functional classification of noun phrases. For this purpose, we used a Self Organizing Map is trained with syntactically-annotated contexts containing noun phrases. The contexts are defined by means of a set of features representing morpho-syntactic properties of both nouns and their wider contexts. Two types of experiments have been run: one based on noun types and the other based on noun tokens. The results of the type simulation show that when frequency is the most prominent classification factor, the network isolates idiomatic or fixed phrases. The results of the token simulation experiment, instead, show that, of the 36 attributes represented in the original input matrix, only a few of them are prominent in the re-organization of the map. In particular, key features in the emergent macro-classification are the type of determiner and the grammatical number of the noun. An additional but not less interesting result is an organization into semantic/pragmatic micro-classes. In conclusions, our result confirm the relative prominence of determiner type and grammatical number in the task of noun (phrase)categorization.


pdf bib
Non-locality all the way through: Emergent Global Constraints in the Italian Morphological Lexicon
Vito Pirrelli | Basilio Calderone | Ivan Herreros | Michele Virgilio
Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology