A Thesaurus for Biblical Hebrew

Miriam Azar, Aliza Pahmer, Joshua Waxman


Abstract
We built a thesaurus for Biblical Hebrew, with connections between roots based on phonetic, semantic, and distributional similarity. To this end, we apply established algorithms to find connections between headwords based on existing lexicons and other digital resources. For semantic similarity, we utilize the cosine-similarity of tf-idf vectors of English gloss text of Hebrew headwords from Ernest Klein’s A Comprehensive Etymological Dictionary of the Hebrew Language for Readers of English as well as to Brown-Driver-Brigg’s Hebrew Lexicon. For phonetic similarity, we digitize part of Matityahu Clark’s Etymological Dictionary of Biblical Hebrew, grouping Hebrew roots into phonemic classes, and establish phonetic relationships between headwords in Klein’s Dictionary. For distributional similarity, we consider the cosine similarity of PPMI vectors of Hebrew roots and also, in a somewhat novel approach, apply Word2Vec to a Biblical corpus reduced to its lexemes. The resulting resource is helpful to those trying to understand Biblical Hebrew, and also stands as a good basis for programs trying to process the Biblical text.
Anthology ID:
2020.lt4hala-1.10
Volume:
Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
LREC | LT4HALA | WS
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
68–73
Language:
English
URL:
https://www.aclweb.org/anthology/2020.lt4hala-1.10
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.lt4hala-1.10.pdf