Six Attributes of Unhealthy Conversations

Ilan Price, Jordan Gifford-Moore, Jory Flemming, Saul Musker, Maayan Roichman, Guillaume Sylvain, Nithum Thain, Lucas Dixon, Jeffrey Sorensen


Abstract
We present a new dataset of approximately 44000 comments labeled by crowdworkers. Each comment is labelled as either ‘healthy’ or ‘unhealthy’, in addition to binary labels for the presence of six potentially ‘unhealthy’ sub-attributes: (1) hostile; (2) antagonistic, insulting, provocative or trolling; (3) dismissive; (4) condescending or patronising; (5) sarcastic; and/or (6) an unfair generalisation. Each label also has an associated confidence score. We argue that there is a need for datasets which enable research based on a broad notion of ‘unhealthy online conversation’. We build this typology to encompass a substantial proportion of the individual comments which contribute to unhealthy online conversation. For some of these attributes, this is the first publicly available dataset of this scale. We explore the quality of the dataset, present some summary statistics and initial models to illustrate the utility of this data, and highlight limitations and directions for further research.
Anthology ID:
2020.alw-1.15
Volume:
Proceedings of the Fourth Workshop on Online Abuse and Harms
Month:
November
Year:
2020
Address:
Online
Venues:
ALW | EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
114–124
Language:
URL:
https://www.aclweb.org/anthology/2020.alw-1.15
DOI:
10.18653/v1/2020.alw-1.15
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.alw-1.15.pdf
Optional supplementary material:
 2020.alw-1.15.OptionalSupplementaryMaterial.pdf