Using machine learning methods and linguistic features in single-document extractive summarization

Alexander Dlikman, Mark Last

Research output: Contribution to journalConference articlepeer-review

Abstract

Extractive summarization of text documents usually consists of ranking the document sentences and extracting the top-ranked sentences subject to the summary length constraints. In this paper, we explore the contribution of various supervised learning algorithms to the sentence ranking task. For this purpose, we introduce a novel sentence ranking methodology based on the similarity score between a candidate sentence and benchmark summaries. Our experiments are performed on three benchmark summarization corpora: DUC-2002, DUC- 2007 and MultiLing-2013. The popular linear regression model achieved the best results in all evaluated datasets. Additionally, the linear regression model, which included POS (Part-of-Speech)-based features, outperformed the one with statistical features only.

Original languageAmerican English
Pages (from-to)1-8
Number of pages8
JournalCEUR Workshop Proceedings
Volume1646
StatePublished - 1 Jan 2016
Event2016 Workshop on Interactions between Data Mining and Natural Language Processing, DMNLP 2016 - Riva del Garda, Italy
Duration: 23 Sep 2016 → …

Keywords

  • Part-of-speech tagging
  • Regression
  • Sentence ranking
  • Supervised learning
  • Text summarization

All Science Journal Classification (ASJC) codes

  • General Computer Science

Fingerprint

Dive into the research topics of 'Using machine learning methods and linguistic features in single-document extractive summarization'. Together they form a unique fingerprint.

Cite this