Exploring Knowledge Bases for Similarity

Eneko Agirre, Montse Cuadros, German Rigau, Aitor Soroa


Abstract
Graph-based similarity over WordNet has been previously shown to perform very well on word similarity. This paper presents a study of the performance of such a graph-based algorithm when using different relations and versions of Wordnet. The graph algorithm is based on Personalized PageRank, a random-walk based algorithm which computes the probability of a random-walk initiated in the target word to reach any synset following the relations in WordNet (Haveliwala, 2002). Similarity is computed as the cosine of the probability distributions for each word over WordNet. The best combination of relations includes all relations in WordNet 3.0, included disambiguated glosses, and automatically disambiguated topic signatures called KnowNets. All relations are part of the official release of WordNet, except KnowNets, which have been derived automatically. The results over the WordSim 353 dataset show that using the adequate relations the performance improves over previously published WordNet-based results on the WordSim353 dataset (Finkelstein et al., 2002). The similarity software and some graphs used in this paper are publicly available at http://ixa2.si.ehu.es/ukb.
Anthology ID:
L10-1367
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/534_Paper.pdf
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/534_Paper.pdf