Gold Corpus for Telegraphic Summarization

Chanakya Malireddy, Srivenkata N M Somisetty, Manish Shrivastava


Abstract
Most extractive summarization techniques operate by ranking all the source sentences and then select the top ranked sentences as the summary. Such methods are known to produce good summaries, especially when applied to news articles and scientific texts. However, they don’t fare so well when applied to texts such as fictional narratives, which don’t have a single central or recurrent theme. This is because usually the information or plot of the story is spread across several sentences. In this paper, we discuss a different summarization technique called Telegraphic Summarization. Here, we don’t select whole sentences, rather pick short segments of text spread across sentences, as the summary. We have tailored a set of guidelines to create such summaries and, using the same, annotate a gold corpus of 200 English short stories.
Anthology ID:
W18-3810
Volume:
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venues:
COLING | LR4NLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
71–77
Language:
URL:
https://www.aclweb.org/anthology/W18-3810
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W18-3810.pdf