Hitachi at SemEval-2020 Task 11: An Empirical Study of Pre-Trained Transformer Family for Propaganda Detection

Gaku Morio, Terufumi Morishita, Hiroaki Ozaki, Toshinori Miyoshi


Abstract
In this paper, we show our system for SemEval-2020 task 11, where we tackle propaganda span identification (SI) and technique classification (TC). We investigate heterogeneous pre-trained language models (PLMs) such as BERT, GPT-2, XLNet, XLM, RoBERTa, and XLM-RoBERTa for SI and TC fine-tuning, respectively. In large-scale experiments, we found that each of the language models has a characteristic property, and using an ensemble model with them is promising. Finally, the ensemble model was ranked 1st amongst 35 teams for SI and 3rd amongst 31 teams for TC.
Anthology ID:
2020.semeval-1.228
Volume:
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Month:
December
Year:
2020
Address:
Barcelona (online)
Venues:
*SEMEVAL | COLING
SIG:
SIGLEX
Publisher:
International Committee for Computational Linguistics
Note:
Pages:
1739–1748
Language:
URL:
https://www.aclweb.org/anthology/2020.semeval-1.228
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.semeval-1.228.pdf