Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality

Thomas Pickard


Abstract
This paper explores the use of word2vec and GloVe embeddings for unsupervised measurement of the semantic compositionality of MWE candidates. Through comparison with several human-annotated reference sets, we find word2vec to be substantively superior to GloVe for this task. We also find Simple English Wikipedia to be a poor-quality resource for compositionality assessment, but demonstrate that a sample of 10% of sentences in the English Wikipedia can provide a conveniently tractable corpus with only moderate reduction in the quality of outputs.
Anthology ID:
2020.mwe-1.12
Volume:
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
Month:
December
Year:
2020
Address:
online
Venues:
COLING | MWE
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
95–100
Language:
URL:
https://www.aclweb.org/anthology/2020.mwe-1.12
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.mwe-1.12.pdf