TY - GEN
T1 - Crowdsourcing lightweight pyramids for manual summary evaluation
AU - Shapira, Ori
AU - Gabay, David
AU - Gao, Yang
AU - Ronen, Hadar
AU - Pasunuru, Ramakanth
AU - Bansal, Mohit
AU - Amsterdamer, Yael
AU - Dagan, Ido
N1 - Publisher Copyright: © 2019 Association for Computational Linguistics
PY - 2019/1/1
Y1 - 2019/1/1
N2 - Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid method and the required expertise, researchers resorted to cheaper and less thorough manual evaluation methods, such as Responsiveness and pairwise comparison, attainable via crowdsourcing. We revisit the Pyramid approach, proposing a lightweight sampling-based version that is crowdsourcable. We analyze the performance of our method in comparison to original expert-based Pyramid evaluations, showing higher correlation relative to the common Responsiveness method. We release our crowdsourced Summary-Content-Units, along with all crowdsourcing scripts, for future evaluations.
AB - Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid method and the required expertise, researchers resorted to cheaper and less thorough manual evaluation methods, such as Responsiveness and pairwise comparison, attainable via crowdsourcing. We revisit the Pyramid approach, proposing a lightweight sampling-based version that is crowdsourcable. We analyze the performance of our method in comparison to original expert-based Pyramid evaluations, showing higher correlation relative to the common Responsiveness method. We release our crowdsourced Summary-Content-Units, along with all crowdsourcing scripts, for future evaluations.
UR - http://www.scopus.com/inward/record.url?scp=85084294321&partnerID=8YFLogxK
M3 - منشور من مؤتمر
T3 - NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference
SP - 682
EP - 687
BT - Long and Short Papers
PB - Association for Computational Linguistics (ACL)
T2 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019
Y2 - 2 June 2019 through 7 June 2019
ER -