Fahad Khan

Also published as: Anas Fahad Khan


2020

pdf bib
Modelling Frequency and Attestations for OntoLex-Lemon
Christian Chiarcos | Maxim Ionov | Jesse de Does | Katrien Depuydt | Anas Fahad Khan | Sander Stolk | Thierry Declerck | John Philip McCrae
Proceedings of the 2020 Globalex Workshop on Linked Lexicography

The OntoLex vocabulary enjoys increasing popularity as a means of publishing lexical resources with RDF and as Linked Data. The recent publication of a new OntoLex module for lexicography, lexicog, reflects its increasing importance for digital lexicography. However, not all aspects of digital lexicography have been covered to the same extent. In particular, supplementary information drawn from corpora such as frequency information, links to attestations, and collocation data were considered to be beyond the scope of lexicog. Therefore, the OntoLex community has put forward the proposal for a novel module for frequency, attestation and corpus information (FrAC), that not only covers the requirements of digital lexicography, but also accommodates essential data structures for lexical information in natural language processing. This paper introduces the current state of the OntoLex-FrAC vocabulary, describes its structure, some selected use cases, elementary concepts and fundamental definitions, with a focus on frequency and attestations.

pdf bib
Modelling Etymology in LMF/TEI: The Grande Dicionário Houaiss da Língua Portuguesa Dictionary as a Use Case
Fahad Khan | Laurent Romary | Ana Salgado | Jack Bowers | Mohamed Khemakhem | Toma Tasovac
Proceedings of the 12th Language Resources and Evaluation Conference

In this article we will introduce two of the new parts of the new multi-part version of the Lexical Markup Framework (LMF) ISO standard, namely part 3 of the standard (ISO 24613-3), which deals with etymological and diachronic data, and Part 4 (ISO 24613-4), which consists of a TEI serialisation of all of the prior parts of the model. We will demonstrate the use of both standards by describing the LMF encoding of a small number of examples taken from a sample conversion of the reference Portuguese dictionary Grande Dicionário Houaiss da Língua Portuguesa, part of a broader experiment comprising the analysis of different, heterogeneously encoded, Portuguese lexical resources. We present the examples in the Unified Modelling Language (UML) and also in a couple of cases in TEI.

pdf bib
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi | John Philip McCrae | Sanni Nimb | Fahad Khan | Monica Monachini | Bolette Pedersen | Thierry Declerck | Tanja Wissik | Andrea Bellandi | Irene Pisani | Thomas Troelsgård | Sussi Olsen | Simon Krek | Veronika Lipp | Tamás Váradi | László Simon | András Gyorffy | Carole Tiberius | Tanneke Schoonheim | Yifat Ben Moshe | Maya Rudich | Raya Abu Ahmad | Dorielle Lonke | Kira Kovalenko | Margit Langemets | Jelena Kallas | Oksana Dereza | Theodorus Fransen | David Cillessen | David Lindemann | Mikel Alonso | Ana Salgado | José Luis Sancho | Rafael-J. Ureña-Ruiz | Jordi Porta Zamorano | Kiril Simov | Petya Osenova | Zara Kancheva | Ivaylo Radev | Ranka Stanković | Andrej Perdih | Dejan Gabrovsek
Proceedings of the 12th Language Resources and Evaluation Conference

Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA.

pdf bib
Representing Temporal Information in Lexical Linked Data Resources
Fahad Khan
Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020)

The increasing recognition of the utility of Linked Data as a means of publishing lexical resource has helped to underline the need for RDF based data models which have the flexibility and expressivity to be able to represent the most salient kinds of information contained in such resources as structured data, including, notably, information relating to time and the temporal dimension. In this article we describe a perdurantist approach to modelling diachronic lexical information which builds upon work which we have previously presented and which is based on the ontolex-lemon vocabulary. We present two extended examples, one taken from the Oxford English Dictionary, the other from a work on etymology, to show how our approach can handle different kinds of temporal information often found in lexical resources.

2018

pdf bib
One Language to rule them all: modelling Morphological Patterns in a Large Scale Italian Lexicon with SWRL
Fahad Khan | Andrea Bellandi | Francesca Frontini | Monica Monachini
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Situating Word Senses in their Historical Context with Linked Data
Fahad Khan | Jack Bowers | Francesca Frontini
IWCS 2017 — 12th International Conference on Computational Semantics — Short papers

pdf bib
Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017)
Francesca Frontini | Larisa Grčić Simeunović | Špela Vintar | Anas Fahad Khan | Artemis Parvisi
Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017)

pdf bib
Designing an Ontology for the Study of Ritual in Ancient Greek Tragedy
Gloria Mugelli | Andrea Bellandi | Federico Boschetti | Anas Fahad Khan
Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017)

2016

pdf bib
Al Qamus al Muhit, a Medieval Arabic Lexicon in LMF
Ouafae Nahli | Francesca Frontini | Monica Monachini | Fahad Khan | Arsalan Zarghili | Mustapha Khalfi
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes the conversion into LMF, a standard lexicographic digital format of ‘al-qāmūs al-muḥīṭ, a Medieval Arabic lexicon. The lexicon is first described, then all the steps required for the conversion are illustrated. The work is will produce a useful lexicographic resource for Arabic NLP, but is also interesting per se, to study the implications of adapting the LMF model to the Arabic language. Some reflections are offered as to the status of roots with respect to previously suggested representations. In particular, roots are, in our opinion are to be not treated as lexical entries, but modeled as lexical metadata for classifying and identifying lexical entries. In this manner, each root connects all entries that are derived from it.

pdf bib
LREC as a Graph: People and Resources in a Network
Riccardo Del Gratta | Francesca Frontini | Monica Monachini | Gabriella Pardelli | Irene Russo | Roberto Bartolini | Fahad Khan | Claudia Soria | Nicoletta Calzolari
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This proposal describes a new way to visualise resources in the LREMap, a community-built repository of language resource descriptions and uses. The LREMap is represented as a force-directed graph, where resources, papers and authors are nodes. The analysis of the visual representation of the underlying graph is used to study how the community gathers around LRs and how LRs are used in research.

pdf bib
Tools and Instruments for Building and Querying Diachronic Computational Lexica
Fahad Khan | Andrea Bellandi | Monica Monachini
Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)

This article describes work on enabling the addition of temporal information to senses of words in linguistic linked open data lexica based on the lemonDia model. Our contribution in this article is twofold. On the one hand, we demonstrate how lemonDia enables the querying of diachronic lexical datasets using OWL-oriented Semantic Web based technologies. On the other hand, we present a preliminary version of an interactive interface intended to help users in creating lexical datasets that model meaning change over time.

2015

pdf bib
Using Ontologies to Model Polysemy in Lexical Resources
Fahad Khan | Francesca Frontini
Proceedings of the 1st Workshop on Language and Ontologies

2014

pdf bib
The IMAGACT Visual Ontology. An Extendable Multilingual Infrastructure for the representation of lexical encoding of Action
Massimo Moneglia | Susan Brown | Francesca Frontini | Gloria Gagliardi | Fahad Khan | Monica Monachini | Alessandro Panunzi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Action verbs have many meanings, covering actions in different ontological types. Moreover, each language categorizes action in its own way. One verb can refer to many different actions and one action can be identified by more than one verb. The range of variations within and across languages is largely unknown, causing trouble for natural language processing tasks. IMAGACT is a corpus-based ontology of action concepts, derived from English and Italian spontaneous speech corpora, which makes use of the universal language of images to identify the different action types extended by verbs referring to action in English, Italian, Chinese and Spanish. This paper presents the infrastructure and the various linguistic information the user can derive from it. IMAGACT makes explicit the variation of meaning of action verbs within one language and allows comparisons of verb variations within and across languages. Because the action concepts are represented with videos, extension into new languages beyond those presently implemented in IMAGACT is done using competence-based judgments by mother-tongue informants without intense lexicographic work involving underdetermined semantic description

2013

pdf bib
Generative Lexicon Theory and Linguistic Linked Open Data
Fahad Khan | Francesca Frontini | Riccardo Del Gratta | Monica Monachini | Valeria Quochi
Proceedings of the 6th International Conference on Generative Approaches to the Lexicon (GL2013)

pdf bib
Disambiguation of Basic Action Types through Nouns’ Telic Qualia
Irene Russo | Francesca Frontini | Irene De Felice | Fahad Khan | Monica Monachini
Proceedings of the 6th International Conference on Generative Approaches to the Lexicon (GL2013)

2012

pdf bib
Verb interpretation for basic action types: annotation, ontology induction and creation of prototypical scenes
Francesca Frontini | Irene De Felice | Fahad Khan | Irene Russo | Monica Monachini | Gloria Gagliardi | Alessandro Panunzi
Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon