TY - GEN
T1 - Evaluating multiple system summary lengths
T2 - 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
AU - Shapira, Ori
AU - Gabay, David
AU - Ronen, Hadar
AU - Bar-Ilan, Judit
AU - Amsterdamer, Yael
AU - Nenkova, Ani
AU - Dagan, Ido
N1 - Publisher Copyright: © 2018 Association for Computational Linguistics
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Practical summarization systems are expected to produce summaries of varying lengths, per user needs. While a couple of early summarization benchmarks tested systems across multiple summary lengths, this practice was mostly abandoned due to the assumed cost of producing reference summaries of multiple lengths. In this paper, we raise the research question of whether reference summaries of a single length can be used to reliably evaluate system summaries of multiple lengths. For that, we have analyzed a couple of datasets as a case study, using several variants of the ROUGE metric that are standard in summarization evaluation. Our findings indicate that the evaluation protocol in question is indeed competitive. This result paves the way to practically evaluating varying-length summaries with simple, possibly existing, summarization benchmarks.
AB - Practical summarization systems are expected to produce summaries of varying lengths, per user needs. While a couple of early summarization benchmarks tested systems across multiple summary lengths, this practice was mostly abandoned due to the assumed cost of producing reference summaries of multiple lengths. In this paper, we raise the research question of whether reference summaries of a single length can be used to reliably evaluate system summaries of multiple lengths. For that, we have analyzed a couple of datasets as a case study, using several variants of the ROUGE metric that are standard in summarization evaluation. Our findings indicate that the evaluation protocol in question is indeed competitive. This result paves the way to practically evaluating varying-length summaries with simple, possibly existing, summarization benchmarks.
UR - http://www.scopus.com/inward/record.url?scp=85081742307&partnerID=8YFLogxK
M3 - Conference contribution
T3 - Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
SP - 774
EP - 778
BT - Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
A2 - Riloff, Ellen
A2 - Chiang, David
A2 - Hockenmaier, Julia
A2 - Tsujii, Jun'ichi
Y2 - 31 October 2018 through 4 November 2018
ER -