Efficient Compression of Long Arbitrary Sequences with No Reference at the Encoder

Yuval Cassuto, Jacob Ziv

Research output: Working paperPreprint

Abstract

In a distributed information application an encoder compresses an arbitrary vector while a similar reference vector is available to the decoder as side information. For the Hamming-distance similarity measure, and when guaranteed perfect reconstruction is required, we present two contributions to the solution of this problem. One result shows that when a set of potential reference vectors is available to the encoder, lower compression rates can be achieved when the set satisfies a certain clustering property. Another result reduces the best known decoding complexity from exponential in the vector length $n$ to $O(n^{1.5})$ by generalized concatenation of inner coset codes and outer error-correcting codes. One potential application of the results is the compression of DNA sequences, where similar (but not identical) reference vectors are shared among senders and receivers.
Original languageEnglish
StatePublished - 23 Feb 2020

Keywords

  • cs.IT
  • math.IT

Fingerprint

Dive into the research topics of 'Efficient Compression of Long Arbitrary Sequences with No Reference at the Encoder'. Together they form a unique fingerprint.

Cite this