Abstract
Extractive summarization of text documents usually consists of ranking the document sentences and extracting the top-ranked sentences subject to the summary length constraints. In this paper, we explore the contribution of various supervised learning algorithms to the sentence ranking task. For this purpose, we introduce a novel sentence ranking methodology based on the similarity score between a candidate sentence and benchmark summaries. Our experiments are performed on three benchmark summarization corpora: DUC-2002, DUC- 2007 and MultiLing-2013. The popular linear regression model achieved the best results in all evaluated datasets. Additionally, the linear regression model, which included POS (Part-of-Speech)-based features, outperformed the one with statistical features only.
Original language | American English |
---|---|
Pages (from-to) | 1-8 |
Number of pages | 8 |
Journal | CEUR Workshop Proceedings |
Volume | 1646 |
State | Published - 1 Jan 2016 |
Event | 2016 Workshop on Interactions between Data Mining and Natural Language Processing, DMNLP 2016 - Riva del Garda, Italy Duration: 23 Sep 2016 → … |
Keywords
- Part-of-speech tagging
- Regression
- Sentence ranking
- Supervised learning
- Text summarization
All Science Journal Classification (ASJC) codes
- General Computer Science