Towards Lexical Chains for Knowledge-Graph-based Word Embeddings

Kiril Simov, Svetla Boytcheva, Petya Osenova


Abstract
Word vectors with varying dimensionalities and produced by different algorithms have been extensively used in NLP. The corpora that the algorithms are trained on can contain either natural language text (e.g. Wikipedia or newswire articles) or artificially-generated pseudo corpora due to natural data sparseness. We exploit Lexical Chain based templates over Knowledge Graph for generating pseudo-corpora with controlled linguistic value. These corpora are then used for learning word embeddings. A number of experiments have been conducted over the following test sets: WordSim353 Similarity, WordSim353 Relatedness and SimLex-999. The results show that, on the one hand, the incorporation of many-relation lexical chains improves results, but on the other hand, unrestricted-length chains remain difficult to handle with respect to their huge quantity.
Anthology ID:
R17-1087
Volume:
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
679–685
Language:
URL:
https://doi.org/10.26615/978-954-452-049-6_087
DOI:
10.26615/978-954-452-049-6_087
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://doi.org/10.26615/978-954-452-049-6_087