Fausto Giunchiglia


pdf bib
A Major Wordnet for a Minority Language: Scottish Gaelic
Gábor Bella | Fiona McNeill | Rody Gorman | Caoimhin O Donnaile | Kirsty MacDonald | Yamini Chandrashekar | Abed Alhakim Freihat | Fausto Giunchiglia
Proceedings of the 12th Language Resources and Evaluation Conference

We present a new wordnet resource for Scottish Gaelic, a Celtic minority language spoken by about 60,000 speakers, most of whom live in Northwestern Scotland. The wordnet contains over 15 thousand word senses and was constructed by merging ten thousand new, high-quality translations, provided and validated by language experts, with an existing wordnet derived from Wiktionary. This new, considerably extended wordnet—currently among the 30 largest in the world—targets multiple communities: language speakers and learners; linguists; computer scientists solving problems related to natural language processing. By publishing it as a freely downloadable resource, we hope to contribute to the long-term preservation of Scottish Gaelic as a living language, both offline and on the Web.

pdf bib
Exploring the Language of Data
Gábor Bella | Linda Gremes | Fausto Giunchiglia
Proceedings of the 28th International Conference on Computational Linguistics

We set out to uncover the unique grammatical properties of an important yet so far under-researched type of natural language text: that of short labels typically found within structured datasets. We show that such labels obey a specific type of abbreviated grammar that we call the Language of Data, with properties significantly different from the kinds of text typically addressed in computational linguistics and NLP, such as ‘standard’ written language or social media messages. We analyse orthography, parts of speech, and syntax over a large, bilingual, hand-annotated corpus of data labels collected from a variety of domains. We perform experiments on tokenisation, part-of-speech tagging, and named entity recognition over real-world structured data, demonstrating that models adapted to the Language of Data outperform those trained on standard text. These observations point in a new direction to be explored as future research, in order to develop new NLP tools and models dedicated to the Language of Data.


pdf bib
CogNet: A Large-Scale Cognate Database
Khuyagbaatar Batsuren | Gabor Bella | Fausto Giunchiglia
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

This paper introduces CogNet, a new, large-scale lexical database that provides cognates -words of common origin and meaning- across languages. The database currently contains 3.1 million cognate pairs across 338 languages using 35 writing systems. The paper also describes the automated method by which cognates were computed from publicly available wordnets, with an accuracy evaluated to 94%. Finally, it presents statistics about the cognate data and some initial insights into it, hinting at a possible future exploitation of the resource by various fields of lingustics.


pdf bib
TrentoTeam at SemEval-2017 Task 3: An application of Grice Maxims in Ranking Community Question Answers
Mohammed R. H. Qwaider | Abed Alhakim Freihat | Fausto Giunchiglia
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

In this paper we present the Tren-toTeam system which participated to thetask 3 at SemEval-2017 (Nakov et al.,2017).We concentrated our work onapplying Grice Maxims(used in manystate-of-the-art Machine learning applica-tions(Vogel et al., 2013; Kheirabadiand Aghagolzadeh, 2012; Dale and Re-iter, 1995; Franke, 2011)) to ranking an-swers of a question by answers relevancy.Particularly, we created a ranker systembased on relevancy scores, assigned by 3main components: Named entity recogni-tion, similarity score, sentiment analysis.Our system obtained a comparable resultsto Machine learning systems.


pdf bib
NAtural Language driven Image Generation
Giovanni Adorni | Mauro Di Manzo | Fausto Giunchiglia
10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics