André Ferreira Cruz


pdf bib
On Sentence Representations for Propaganda Detection: From Handcrafted Features to Word Embeddings
André Ferreira Cruz | Gil Rocha | Henrique Lopes Cardoso
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda

Bias is ubiquitous in most online sources of natural language, from news media to social networks. Given the steady shift in news consumption behavior from traditional outlets to online sources, the automatic detection of propaganda, in which information is shaped to purposefully foster a predetermined agenda, is an increasingly crucial task. To this goal, we explore the task of sentence-level propaganda detection, and experiment with both handcrafted features and learned dense semantic representations. We also experiment with random undersampling of the majority class (non-propaganda) to curb the influence of class distribution on the system’s performance, leading to marked improvements on the minority class (propaganda). Our best performing system uses pre-trained ELMo word embeddings, followed by a bidirectional LSTM and an attention layer. We have submitted a 5-model ensemble of our best performing system to the NLP4IF shared task on sentence-level propaganda detection (team LIACC), achieving rank 10 among 25 participants, with 59.5 F1-score.