Private or Corporate? Predicting User Types on Twitter

Nikola Ljubešić, Darja Fišer


Abstract
In this paper we present a series of experiments on discriminating between private and corporate accounts on Twitter. We define features based on Twitter metadata, morphosyntactic tags and surface forms, showing that the simple bag-of-words model achieves single best results that can, however, be improved by building a weighted soft ensemble of classifiers based on each feature type. Investigating the time and language dependence of each feature type delivers quite unexpecting results showing that features based on metadata are neither time- nor language-insensitive as the way the two user groups use the social network varies heavily through time and space.
Anthology ID:
W16-3904
Volume:
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)
Month:
December
Year:
2016
Address:
Osaka, Japan
Venues:
WNUT | WS
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
4–12
Language:
URL:
https://www.aclweb.org/anthology/W16-3904
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W16-3904.pdf