Abstract
We consider the problem of lossless compression of individual sequences using finite-state (FS) machines, from the perspective of the best achievable empirical cumulant generating function (CGF) of the code length, i.e., the normalized logarithm of the empirical average of the exponentiated code length. Since the probabilistic CGF is minimized in terms of the Rényi entropy of the source, one of the motivations of this paper is to derive an individual-sequence analogue of the Rényi entropy, in the same way that the FS compressibility is the individual-sequence counterpart of the Shannon entropy. We consider the CGF of the code-length both from the perspective of fixed-to-variable length coding and the perspective of variable-to-variable (V-V) length coding, where the latter turns out to yield a better result, that coincides with the FS compressibility. We also extend our results to compression with side information, available at both the encoder and decoder. In this case, the V-V version no longer coincides with the FS compressibility, but results in a different complexity measure.
Original language | English |
---|---|
Article number | 8052537 |
Pages (from-to) | 7729-7736 |
Number of pages | 8 |
Journal | IEEE Transactions on Information Theory |
Volume | 63 |
Issue number | 12 |
DOIs | |
State | Published - Dec 2017 |
Keywords
- Individual sequences
- Lempel-Ziv algorithm
- Rényi entropy
- compressibility
- cumulant generating function
- finite-state machines
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Library and Information Sciences