Incorporating Emoji Descriptions Improves Tweet Classification

Abhishek Singh, Eduardo Blanco, Wei Jin


Abstract
Tweets are short messages that often include specialized language such as hashtags and emojis. In this paper, we present a simple strategy to process emojis: replace them with their natural language description and use pretrained word embeddings as normally done with standard words. We show that this strategy is more effective than using pretrained emoji embeddings for tweet classification. Specifically, we obtain new state-of-the-art results in irony detection and sentiment analysis despite our neural network is simpler than previous proposals.
Anthology ID:
N19-1214
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2096–2101
Language:
URL:
https://www.aclweb.org/anthology/N19-1214
DOI:
10.18653/v1/N19-1214
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/N19-1214.pdf
Video:
 https://vimeo.com/354236787