Finding Rising and Falling Words

Erik Tjong Kim Sang


Abstract
We examine two different methods for finding rising words (among which neologisms) and falling words (among which archaisms) in decades of magazine texts (millions of words) and in years of tweets (billions of words): one based on correlation coefficients of relative frequencies and time, and one based on comparing initial and final word frequencies of time intervals. We find that smoothing frequency scores improves the precision scores of both methods and that the correlation coefficients perform better on magazine text but worse on tweets. Since the two ranking methods find different words they can be used in side-by-side to study the behavior of words over time.
Anthology ID:
W16-4002
Volume:
Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)
Month:
December
Year:
2016
Address:
Osaka, Japan
Venues:
LT4DH | WS
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
2–9
Language:
URL:
https://www.aclweb.org/anthology/W16-4002
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W16-4002.pdf