Quantifying the Semantic Core of Gender Systems

Adina Williams, Damian Blasi, Lawrence Wolf-Sonkin, Hanna Wallach, Ryan Cotterell


Abstract
Many of the world’s languages employ grammatical gender on the lexeme. For instance, in Spanish, house “casa” is feminine, whereas the word for paper “papel” is masculine. To a speaker of a genderless language, this categorization seems to exist with neither rhyme nor reason. But, is the association of nouns to gender classes truly arbitrary? In this work, we present the first large-scale investigation of the arbitrariness of gender assignment that uses canonical correlation analysis as a method for correlating the gender of inanimate nouns with their lexical semantic meaning. We find that the gender systems of 18 languages exhibit a significant correlation with an externally grounded definition of lexical semantics.
Anthology ID:
D19-1577
Volume:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:
November
Year:
2019
Address:
Hong Kong, China
Venues:
EMNLP | IJCNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
5734–5739
Language:
URL:
https://www.aclweb.org/anthology/D19-1577
DOI:
10.18653/v1/D19-1577
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/D19-1577.pdf