Hitachi at SemEval-2020 Task 8: Simple but Effective Modality Ensemble for Meme Emotion Recognition

Terufumi Morishita, Gaku Morio, Shota Horiguchi, Hiroaki Ozaki, Toshinori Miyoshi


Abstract
Users of social networking services often share their emotions via multi-modal content, usually images paired with text embedded in them. SemEval-2020 task 8, Memotion Analysis, aims at automatically recognizing these emotions of so-called internet memes. In this paper, we propose a simple but effective Modality Ensemble that incorporates visual and textual deep-learning models, which are independently trained, rather than providing a single multi-modal joint network. To this end, we first fine-tune four pre-trained visual models (i.e., Inception-ResNet, PolyNet, SENet, and PNASNet) and four textual models (i.e., BERT, GPT-2, Transformer-XL, and XLNet). Then, we fuse their predictions with ensemble methods to effectively capture cross-modal correlations. The experiments performed on dev-set show that both visual and textual features aided each other, especially in subtask-C, and consequently, our system ranked 2nd on subtask-C.
Anthology ID:
2020.semeval-1.149
Volume:
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Month:
December
Year:
2020
Address:
Barcelona (online)
Venues:
*SEMEVAL | COLING
SIG:
SIGLEX
Publisher:
International Committee for Computational Linguistics
Note:
Pages:
1126–1134
Language:
URL:
https://www.aclweb.org/anthology/2020.semeval-1.149
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.semeval-1.149.pdf