Optimal partitioning of data chunks in deduplication systems

M. Hirsch, A. Ish-Shalom, S. T. Klein

Research output: Contribution to journalArticlepeer-review


Deduplication is a special case of data compression in which repeated chunks of data are stored only once. For very large chunks, this process may be applied even if the chunks are similar and not necessarily identical, and then the encoding of duplicate data consists of a sequence of pointers to matching parts. However, not all the pointers are worth being kept, as they incur some storage overhead. A linear, sub-optimal solution of this partition problem is presented, followed by an optimal solution with cubic time complexity and requiring quadratic space.

Original languageEnglish
Pages (from-to)104-114
Number of pages11
JournalDiscrete Applied Mathematics
StatePublished - 30 Oct 2016


  • Deduplication
  • Dynamic programming
  • Partitioning data chunks

All Science Journal Classification (ASJC) codes

  • Discrete Mathematics and Combinatorics
  • Applied Mathematics


Dive into the research topics of 'Optimal partitioning of data chunks in deduplication systems'. Together they form a unique fingerprint.

Cite this