Michele Corazza


pdf bib
Hybrid Emoji-Based Masked Language Models for Zero-Shot Abusive Language Detection
Michele Corazza | Stefano Menini | Elena Cabrio | Sara Tonelli | Serena Villata
Findings of the Association for Computational Linguistics: EMNLP 2020

Recent studies have demonstrated the effectiveness of cross-lingual language model pre-training on different NLP tasks, such as natural language inference and machine translation. In our work, we test this approach on social media data, which are particularly challenging to process within this framework, since the limited length of the textual messages and the irregularity of the language make it harder to learn meaningful encodings. More specifically, we propose a hybrid emoji-based Masked Language Model (MLM) to leverage the common information conveyed by emojis across different languages and improve the learned cross-lingual representation of short text messages, with the goal to perform zero- shot abusive language detection. We compare the results obtained with the original MLM to the ones obtained by our method, showing improved performance on German, Italian and Spanish.


pdf bib
A System to Monitor Cyberbullying based on Message Classification and Social Network Analysis
Stefano Menini | Giovanni Moretti | Michele Corazza | Elena Cabrio | Sara Tonelli | Serena Villata
Proceedings of the Third Workshop on Abusive Language Online

Social media platforms like Twitter and Instagram face a surge in cyberbullying phenomena against young users and need to develop scalable computational methods to limit the negative consequences of this kind of abuse. Despite the number of approaches recently proposed in the Natural Language Processing (NLP) research area for detecting different forms of abusive language, the issue of identifying cyberbullying phenomena at scale is still an unsolved problem. This is because of the need to couple abusive language detection on textual message with network analysis, so that repeated attacks against the same person can be identified. In this paper, we present a system to monitor cyberbullying phenomena by combining message classification and social network analysis. We evaluate the classification module on a data set built on Instagram messages, and we describe the cyberbullying monitoring user interface.

pdf bib
Comparing Automated Methods to Detect Explicit Content in Song Lyrics
Michael Fell | Elena Cabrio | Michele Corazza | Fabien Gandon
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

The Parental Advisory Label (PAL) is a warning label that is placed on audio recordings in recognition of profanity or inappropriate references, with the intention of alerting parents of material potentially unsuitable for children. Since 2015, digital providers – such as iTunes, Spotify, Amazon Music and Deezer – also follow PAL guidelines and tag such tracks as “explicit”. Nowadays, such labelling is carried out mainly manually on voluntary basis, with the drawbacks of being time consuming and therefore costly, error prone and partly a subjective task. In this paper, we compare automated methods ranging from dictionary-based lookup to state-of-the-art deep neural networks to automatically detect explicit contents in English lyrics. We show that more complex models perform only slightly better on this task, and relying on a qualitative analysis of the data, we discuss the inherent hardness and subjectivity of the task.