The Summary Evaluation Task in the MultiLing - RANLP 2019 Workshop

George Giannakopoulos, Nikiforos Pittaras


Abstract
This report covers the summarization evaluation task, proposed to the summarization community via the MultiLing 2019 Workshop of the RANLP 2019 conference. The task aims to encourage the development of automatic summarization evaluation methods closely aligned with manual, human-authored summary grades and judgements. A multilingual setting is adopted, building upon a corpus of Wikinews articles across 6 languages (English, Arabic, Romanian, Greek, Spanish and Czech). The evaluation utilizes human (golden) and machine-generated (peer) summaries, which have been assigned human evaluation scores from previous MultiLing tasks. Using these resources, the original corpus is augmented with synthetic data, combining summary texts under three different strategies (reorder, merge and replace), each engineered to introduce noise in the summary in a controlled and quantifiable way. We estimate that the utilization of such data can extract and highlight useful attributes of summary quality estimation, aiding the creation of data-driven automatic methods with an increased correlation to human summary evaluations across domains and languages. This paper provides a brief description of the summary evaluation task, the data generation protocol and the resources made available by the MultiLing community, towards improving automatic summarization evaluation.
Anthology ID:
W19-8903
Volume:
Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Venues:
RANLP | WS
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
11–16
Language:
URL:
https://www.aclweb.org/anthology/W19-8903
DOI:
10.26615/978-954-452-058-8_003
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W19-8903.pdf