On Sentence Representations for Propaganda Detection: From Handcrafted Features to Word Embeddings

André Ferreira Cruz, Gil Rocha, Henrique Lopes Cardoso


Abstract
Bias is ubiquitous in most online sources of natural language, from news media to social networks. Given the steady shift in news consumption behavior from traditional outlets to online sources, the automatic detection of propaganda, in which information is shaped to purposefully foster a predetermined agenda, is an increasingly crucial task. To this goal, we explore the task of sentence-level propaganda detection, and experiment with both handcrafted features and learned dense semantic representations. We also experiment with random undersampling of the majority class (non-propaganda) to curb the influence of class distribution on the system’s performance, leading to marked improvements on the minority class (propaganda). Our best performing system uses pre-trained ELMo word embeddings, followed by a bidirectional LSTM and an attention layer. We have submitted a 5-model ensemble of our best performing system to the NLP4IF shared task on sentence-level propaganda detection (team LIACC), achieving rank 10 among 25 participants, with 59.5 F1-score.
Anthology ID:
D19-5015
Volume:
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda
Month:
November
Year:
2019
Address:
Hong Kong, China
Venues:
EMNLP | NLP4IF | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
107–112
Language:
URL:
https://www.aclweb.org/anthology/D19-5015
DOI:
10.18653/v1/D19-5015
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/D19-5015.pdf