Hadar Ronen


2019

pdf bib
Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation
Ori Shapira | David Gabay | Yang Gao | Hadar Ronen | Ramakanth Pasunuru | Mohit Bansal | Yael Amsterdamer | Ido Dagan
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid method and the required expertise, researchers resorted to cheaper and less thorough manual evaluation methods, such as Responsiveness and pairwise comparison, attainable via crowdsourcing. We revisit the Pyramid approach, proposing a lightweight sampling-based version that is crowdsourcable. We analyze the performance of our method in comparison to original expert-based Pyramid evaluations, showing higher correlation relative to the common Responsiveness method. We release our crowdsourced Summary-Content-Units, along with all crowdsourcing scripts, for future evaluations.

2018

pdf bib
Evaluating Multiple System Summary Lengths: A Case Study
Ori Shapira | David Gabay | Hadar Ronen | Judit Bar-Ilan | Yael Amsterdamer | Ani Nenkova | Ido Dagan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Practical summarization systems are expected to produce summaries of varying lengths, per user needs. While a couple of early summarization benchmarks tested systems across multiple summary lengths, this practice was mostly abandoned due to the assumed cost of producing reference summaries of multiple lengths. In this paper, we raise the research question of whether reference summaries of a single length can be used to reliably evaluate system summaries of multiple lengths. For that, we have analyzed a couple of datasets as a case study, using several variants of the ROUGE metric that are standard in summarization evaluation. Our findings indicate that the evaluation protocol in question is indeed competitive. This result paves the way to practically evaluating varying-length summaries with simple, possibly existing, summarization benchmarks.

2017

pdf bib
Interactive Abstractive Summarization for Event News Tweets
Ori Shapira | Hadar Ronen | Meni Adler | Yael Amsterdamer | Judit Bar-Ilan | Ido Dagan
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present a novel interactive summarization system that is based on abstractive summarization, derived from a recent consolidated knowledge representation for multiple texts. We incorporate a couple of interaction mechanisms, providing a bullet-style summary while allowing to attain the most important information first and interactively drill down to more specific details. A usability study of our implementation, for event news tweets, suggests the utility of our approach for text exploration.