GUIR @ LongSumm 2020: Learning to Generate Long Summaries from Scientific Documents

Sajad Sotudeh Gharebagh, Arman Cohan, Nazli Goharian


Abstract
This paper presents our methods for the LongSumm 2020: Shared Task on Generating Long Summaries for Scientific Documents, where the task is to generatelong summaries given a set of scientific papers provided by the organizers. We explore 3 main approaches for this task: 1. An extractive approach using a BERT-based summarization model; 2. A two stage model that additionally includes an abstraction step using BART; and 3. A new multi-tasking approach on incorporating document structure into the summarizer. We found that our new multi-tasking approach outperforms the two other methods by large margins. Among 9 participants in the shared task, our best model ranks top according to Rouge-1 score (53.11%) while staying competitive in terms of Rouge-2.
Anthology ID:
2020.sdp-1.41
Volume:
Proceedings of the First Workshop on Scholarly Document Processing
Month:
November
Year:
2020
Address:
Online
Venues:
EMNLP | sdp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
356–361
Language:
URL:
https://www.aclweb.org/anthology/2020.sdp-1.41
DOI:
10.18653/v1/2020.sdp-1.41
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.sdp-1.41.pdf