Learning to Score System Summaries for Better Content Selection Evaluation.

Maxime Peyrard, Teresa Botschen, Iryna Gurevych


Abstract
The evaluation of summaries is a challenging but crucial task of the summarization field. In this work, we propose to learn an automatic scoring metric based on the human judgements available as part of classical summarization datasets like TAC-2008 and TAC-2009. Any existing automatic scoring metrics can be included as features, the model learns the combination exhibiting the best correlation with human judgments. The reliability of the new metric is tested in a further manual evaluation where we ask humans to evaluate summaries covering the whole scoring spectrum of the metric. We release the trained metric as an open-source tool.
Anthology ID:
W17-4510
Volume:
Proceedings of the Workshop on New Frontiers in Summarization
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Venue:
WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
74–84
Language:
URL:
https://www.aclweb.org/anthology/W17-4510
DOI:
10.18653/v1/W17-4510
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W17-4510.pdf