TY - GEN
T1 - Krimping texts for better summarization
AU - Litvak, Marina
AU - Vanetik, Natalia
AU - Last, Mark
N1 - Publisher Copyright: © 2015 Association for Computational Linguistics.
PY - 2015/1/1
Y1 - 2015/1/1
N2 - Automated text summarization is aimed at extracting essential information from original text and presenting it in a minimal, often predefined, number of words. In this paper, we introduce a new approach for unsupervised extractive summarization, based on the Minimum Description Length (MDL) principle, using the Krimp dataset compression algorithm (Vreeken et al., 2011). Our approach represents a text as a transactional dataset, with sentences as transactions, and then describes it by itemsets that stand for frequent sequences of words. The summary is then compiled from sentences that compress (and as such, best describe) the document. The problem of summarization is reduced to the maximal coverage, following the assumption that a summary that best describes the original text, should cover most of the word sequences describing the document. We solve it by a greedy algorithm and present the evaluation results.
AB - Automated text summarization is aimed at extracting essential information from original text and presenting it in a minimal, often predefined, number of words. In this paper, we introduce a new approach for unsupervised extractive summarization, based on the Minimum Description Length (MDL) principle, using the Krimp dataset compression algorithm (Vreeken et al., 2011). Our approach represents a text as a transactional dataset, with sentences as transactions, and then describes it by itemsets that stand for frequent sequences of words. The summary is then compiled from sentences that compress (and as such, best describe) the document. The problem of summarization is reduced to the maximal coverage, following the assumption that a summary that best describes the original text, should cover most of the word sequences describing the document. We solve it by a greedy algorithm and present the evaluation results.
UR - http://www.scopus.com/inward/record.url?scp=84959864175&partnerID=8YFLogxK
U2 - https://doi.org/10.18653/v1/d15-1223
DO - https://doi.org/10.18653/v1/d15-1223
M3 - Conference contribution
T3 - Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing
SP - 1931
EP - 1935
BT - Conference Proceedings - EMNLP 2015
PB - Association for Computational Linguistics (ACL)
T2 - Conference on Empirical Methods in Natural Language Processing, EMNLP 2015
Y2 - 17 September 2015 through 21 September 2015
ER -