Discovering Phonesthemes with Sparse Regularization

Nelson F. Liu, Gina-Anne Levow, Noah A. Smith


Abstract
We introduce a simple method for extracting non-arbitrary form-meaning representations from a collection of semantic vectors. We treat the problem as one of feature selection for a model trained to predict word vectors from subword features. We apply this model to the problem of automatically discovering phonesthemes, which are submorphemic sound clusters that appear in words with similar meaning. Many of our model-predicted phonesthemes overlap with those proposed in the linguistics literature, and we validate our approach with human judgments.
Anthology ID:
W18-1206
Volume:
Proceedings of the Second Workshop on Subword/Character LEvel Models
Month:
June
Year:
2018
Address:
New Orleans
Venues:
NAACL | SCLeM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
49–54
Language:
URL:
https://www.aclweb.org/anthology/W18-1206
DOI:
10.18653/v1/W18-1206
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W18-1206.pdf