Exploring Deep Multimodal Fusion of Text and Photo for Hate Speech Classification

Fan Yang, Xiaochang Peng, Gargi Ghosh, Reshef Shilon, Hao Ma, Eider Moore, Goran Predovic


Abstract
Interactions among users on social network platforms are usually positive, constructive and insightful. However, sometimes people also get exposed to objectionable content such as hate speech, bullying, and verbal abuse etc. Most social platforms have explicit policy against hate speech because it creates an environment of intimidation and exclusion, and in some cases may promote real-world violence. As users’ interactions on today’s social networks involve multiple modalities, such as texts, images and videos, in this paper we explore the challenge of automatically identifying hate speech with deep multimodal technologies, extending previous research which mostly focuses on the text signal alone. We present a number of fusion approaches to integrate text and photo signals. We show that augmenting text with image embedding information immediately leads to a boost in performance, while applying additional attention fusion methods brings further improvement.
Anthology ID:
W19-3502
Volume:
Proceedings of the Third Workshop on Abusive Language Online
Month:
August
Year:
2019
Address:
Florence, Italy
Venues:
ACL | ALW | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11–18
Language:
URL:
https://www.aclweb.org/anthology/W19-3502
DOI:
10.18653/v1/W19-3502
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W19-3502.pdf