KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media

Ali Safaya, Moutasem Abdullatif, Deniz Yuret


Abstract
In this paper, we describe our approach to utilize pre-trained BERT models with Convolutional Neural Networks for sub-task A of the Multilingual Offensive Language Identification shared task (OffensEval 2020), which is a part of the SemEval 2020. We show that combining CNN with BERT is better than using BERT on its own, and we emphasize the importance of utilizing pre-trained language models for downstream tasks. Our system, ranked 4th with macro averaged F1-Score of 0.897 in Arabic, 4th with score of 0.843 in Greek, and 3rd with score of 0.814 in Turkish. Additionally, we present ArabicBERT, a set of pre-trained transformer language models for Arabic that we share with the community.
Anthology ID:
2020.semeval-1.271
Volume:
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Month:
December
Year:
2020
Address:
Barcelona (online)
Venues:
*SEMEVAL | COLING
SIG:
SIGLEX
Publisher:
International Committee for Computational Linguistics
Note:
Pages:
2054–2059
Language:
URL:
https://www.aclweb.org/anthology/2020.semeval-1.271
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.semeval-1.271.pdf