Bernal Jiménez Gutiérrez


2020

pdf bib
Document Classification for COVID-19 Literature
Bernal Jiménez Gutiérrez | Juncheng Zeng | Dongdong Zhang | Ping Zhang | Yu Su
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020

The global pandemic has made it more important than ever to quickly and accurately retrieve relevant scientific literature for effective consumption by researchers in a wide range of fields. We provide an analysis of several multi-label document classification models on the LitCovid dataset. We find that pre-trained language models outperform other models in both low and high data regimes, achieving a maximum F1 score of around 86%. We note that even the highest performing models still struggle with label correlation, distraction from introductory text and CORD-19 generalization. Both data and code are available on GitHub.