SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics

Daniel Deutsch, Dan Roth


Abstract
We present SacreROUGE, an open-source library for using and developing summarization evaluation metrics. SacreROUGE removes many obstacles that researchers face when using or developing metrics: (1) The library provides Python wrappers around the official implementations of existing evaluation metrics so they share a common, easy-to-use interface; (2) it provides functionality to evaluate how well any metric implemented in the library correlates to human-annotated judgments, so no additional code needs to be written for a new evaluation metric; and (3) it includes scripts for loading datasets that contain human judgments so they can easily be used for evaluation. This work describes the design of the library, including the core Metric interface, the command-line API for evaluating summarization models and metrics, and the scripts to load and reformat publicly available datasets. The development of SacreROUGE is ongoing and open to contributions from the community.
Anthology ID:
2020.nlposs-1.17
Volume:
Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)
Month:
November
Year:
2020
Address:
Online
Venues:
EMNLP | NLPOSS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
120–125
Language:
URL:
https://www.aclweb.org/anthology/2020.nlposs-1.17
DOI:
10.18653/v1/2020.nlposs-1.17
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.nlposs-1.17.pdf
Optional supplementary material:
 2020.nlposs-1.17.OptionalSupplementaryMaterial.pdf