Tweet Classification without the Tweet: An Empirical Examination of User versus Document Attributes

Veronica Lynn, Salvatore Giorgi, Niranjan Balasubramanian, H. Andrew Schwartz


Abstract
NLP naturally puts a primary focus on leveraging document language, occasionally considering user attributes as supplemental. However, as we tackle more social scientific tasks, it is possible user attributes might be of primary importance and the document supplemental. Here, we systematically investigate the predictive power of user-level features alone versus document-level features for document-level tasks. We first show user attributes can sometimes carry more task-related information than the document itself. For example, a tweet-level stance detection model using only 13 user-level attributes (i.e. features that did not depend on the specific tweet) was able to obtain a higher F1 than the top-performing SemEval participant. We then consider multiple tasks and a wider range of user attributes, showing the performance of strong document-only models can often be improved (as in stance, sentiment, and sarcasm) with user attributes, particularly benefiting tasks with stable “trait-like” outcomes (e.g. stance) most relative to frequently changing “state-like” outcomes (e.g. sentiment). These results not only support the growing work on integrating user factors into predictive systems, but that some of our NLP tasks might be better cast primarily as user-level (or human) tasks.
Anthology ID:
W19-2103
Volume:
Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Venues:
NAACL | NLP+CSS | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18–28
Language:
URL:
https://www.aclweb.org/anthology/W19-2103
DOI:
10.18653/v1/W19-2103
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W19-2103.pdf