English WordNet Random Walk Pseudo-Corpora

Filip Klubička, Alfredo Maldonado, Abhijit Mahalunkar, John Kelleher


Abstract
This is a resource description paper that describes the creation and properties of a set of pseudo-corpora generated artificially from a random walk over the English WordNet taxonomy. Our WordNet taxonomic random walk implementation allows the exploration of different random walk hyperparameters and the generation of a variety of different pseudo-corpora. We find that different combinations of parameters result in varying statistical properties of the generated pseudo-corpora. We have published a total of 81 pseudo-corpora that we have used in our previous research, but have not exhausted all possible combinations of hyperparameters, which is why we have also published a codebase that allows the generation of additional WordNet taxonomic pseudo-corpora as needed. Ultimately, such pseudo-corpora can be used to train taxonomic word embeddings, as a way of transferring taxonomic knowledge into a word embedding space.
Anthology ID:
2020.lrec-1.602
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
COLING | LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4893–4902
Language:
English
URL:
https://www.aclweb.org/anthology/2020.lrec-1.602
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.lrec-1.602.pdf