Constructing of an Ontology-based Lexicon for Bulgarian

Kiril Simov, Petya Osenova


Abstract
In this paper we report on the progress in the creation of an Ontology-based lexicon for Bulgarian. We have started with the concept set from an upper ontology (DOLCE). Then it was extended with concepts selected from the OntoWordNet, which correspond to Core WordNet and EuroWordNet Basic concepts. The underlying idea behind the ontology-based lexicon is its organization via two semantic relations - equivalence and subsumption. These relations reflect the distribution of lexical unit senses with respect to the concepts in the ontology. The lexical unit candidates for concept mapping have been selected from two large and well-developed lexical resources for Bulgarian - a machine readable explanatory dictionary and a morphological lexicon. In the initial step, the lexical units were handled that have equivalent senses to the concepts in the ontology (2500 at the moment). Then, in the second stage, we are proceeding with lexical units selected on their frequency distribution in a large Bulgarian corpus. This step is the more challenging one, since it might require also additions of concepts to the ontology. The main applications of the lexicon are envisaged to be the semantic annotation and semantic IR for Bulgarian.
Anthology ID:
L10-1586
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/848_Paper.pdf
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/848_Paper.pdf