BIGODM System in the Social Media Mining for Health Applications Shared Task 2019

Chen-Kai Wang, Hong-Jie Dai, Bo-Hung Wang


Abstract
In this study, we describe our methods to automatically classify Twitter posts conveying events of adverse drug reaction (ADR). Based on our previous experience in tackling the ADR classification task, we empirically applied the vote-based under-sampling ensemble approach along with linear support vector machine (SVM) to develop our classifiers as part of our participation in ACL 2019 Social Media Mining for Health Applications (SMM4H) shared task 1. The best-performed model on the test sets were trained on a merged corpus consisting of the datasets released by SMM4H 2017 and 2019. By using VUE, the corpus was randomly under-sampled with 2:1 ratio between the negative and positive classes to create an ensemble using the linear kernel trained with features including bag-of-word, domain knowledge, negation and word embedding. The best performing model achieved an F-measure of 0.551 which is about 5% higher than the average F-scores of 16 teams.
Anthology ID:
W19-3220
Volume:
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task
Month:
August
Year:
2019
Address:
Florence, Italy
Venues:
ACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
117–119
Language:
URL:
https://www.aclweb.org/anthology/W19-3220
DOI:
10.18653/v1/W19-3220
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W19-3220.pdf