Rethinking Phonotactic Complexity

Tiago Pimentel, Brian Roark, Ryan Cotterell


Abstract
In this work, we propose the use of phone-level language models to estimate phonotactic complexity—measured in bits per phoneme—which makes cross-linguistic comparison straightforward. We compare the entropy across languages using this simple measure, gaining insight on how complex different language’s phonotactics are. Finally, we show a very strong negative correlation between phonotactic complexity and the average length of words—Spearman rho=-0.744—when analysing a collection of 106 languages with 1016 basic concepts each.
Anthology ID:
W19-3628
Volume:
Proceedings of the 2019 Workshop on Widening NLP
Month:
August
Year:
2019
Address:
Florence, Italy
Venues:
ACL | WS | WiNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
88–90
Language:
URL:
https://www.aclweb.org/anthology/W19-3628
DOI:
Bib Export formats:
BibTeX MODS XML EndNote