UVA Wahoos at SemEval-2019 Task 6: Hate Speech Identification using Ensemble Machine Learning

Murugesan Ramakrishnan, Wlodek Zadrozny, Narges Tabari


Abstract
With the growth in the usage of social media, it has become increasingly common for people to hide behind a mask and abuse others. We have attempted to detect such tweets and comments that are malicious in intent, which either targets an individual or a group. Our best classifier for identifying offensive tweets for SubTask A (Classifying offensive vs. nonoffensive) has an accuracy of 83.14% and a f1- score of 0.7565 on the actual test data. For SubTask B, to identify if an offensive tweet is targeted (If targeted towards an individual or a group), the classifier performs with an accuracy of 89.17% and f1-score of 0.5885. The paper talks about how we generated linguistic and semantic features to build an ensemble machine learning model. By training with more extracts from different sources (Facebook, and more tweets), the paper shows how the accuracy changes with additional training data.
Anthology ID:
S19-2141
Volume:
Proceedings of the 13th International Workshop on Semantic Evaluation
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota, USA
Venue:
*SEMEVAL
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
806–811
Language:
URL:
https://www.aclweb.org/anthology/S19-2141
DOI:
10.18653/v1/S19-2141
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/S19-2141.pdf