Quasi-distinct parsing and optimal compression methods

Amihood Amir, Yonatan Aumann, Avivit Levy, Yuri Roshko

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, the optimality proof of ZivLempel coding is re-studied, and a more general compression optimality theorem is derived. In particular, the property of quasi-distinct parsing is defined. This property allows infinitely many repetitions of phrases in the parsing as long as the total number of repetitions is o(nlogn), where n is length of the parsed string. The quasi-distinct parsing property is weaker than distinct parsing used in the original proof which does not allow repetitions of phrases in the parsing. Yet we show that the theorem holds with this weaker property as well. This provides a better understanding of the optimality proof of ZivLempel coding, together with a new tool for proving optimality of other compression schemes which is applicable for a much wider family of codes. To demonstrate the possible use of this generalization, a new coding methodthe Arithmetic Progression Tree coding (APT)is presented. This new coding method is based on a principle that is very different from ZivLempel's coding. Nevertheless, the APT coding is analyzed in this paper and using the generalized theorem shown to be asymptotically optimal up to a constant factor, 1 if the APT quasi-distinctness hypothesis holds. An empirical evidence that this hypothesis holds is also given.

Original languageEnglish
Pages (from-to)1-14
Number of pages14
JournalTheoretical Computer Science
Volume422
DOIs
StatePublished - 9 Mar 2012

Keywords

  • Arithmetic progressions
  • Optimal compression
  • Parsing

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Quasi-distinct parsing and optimal compression methods'. Together they form a unique fingerprint.

Cite this