Efficient Distributed Source Coding of Fragmented Genomic Sequencing Data

Yotam Gershon, Yuval Cassuto

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper we present a new compression scheme for genomic read data produced by modern sequencing technologies. In this setting, a reference genome similar to the one being sequenced is available only at the decoder, while the starting index of each read in this reference in unknown. The proposed scheme significantly reduces the encoding complexity relative to known reference-based compression schemes. The results include a code construction based on generalized concatenation coset codes, analysis of the decoding failure probability, and optimization of the scheme parameters for minimal compression rate.

Original languageEnglish
Title of host publication2021 IEEE International Symposium on Information Theory, ISIT 2021 - Proceedings
Pages3302-3307
Number of pages6
ISBN (Electronic)9781538682098
DOIs
StatePublished - 12 Jul 2021
Event2021 IEEE International Symposium on Information Theory, ISIT 2021 - Virtual, Melbourne, Australia
Duration: 12 Jul 202120 Jul 2021

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
Volume2021-July

Conference

Conference2021 IEEE International Symposium on Information Theory, ISIT 2021
Country/TerritoryAustralia
CityVirtual, Melbourne
Period12/07/2120/07/21

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Information Systems
  • Modelling and Simulation
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Efficient Distributed Source Coding of Fragmented Genomic Sequencing Data'. Together they form a unique fingerprint.

Cite this