An Upper Bound on the Capacity of the DNA Storage Channel

Andreas Lenz, Paul H. Siegel, Antonia Wachter-Zeh, Eitan Yaakobi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Paved by recent advances in sequencing and synthesis technologies, DNA has evolved to a competitive medium for long-term data storage. In this paper we conduct an information theoretic study of the storage channel-the entity that formulates the relation between stored and sequenced strands. In particular, we derive an upper bound on the Shannon capacity of the channel. In our channel model, we incorporate the main attributes that characterize DNA-based data storage. That is, information is synthesized on many short DNA strands, and each strand is copied many times. Due to the storage and sequencing methods, the receiver draws strands from the original sequences in an uncontrollable manner, where it is possible that copies of the same sequence are drawn multiple times. Additionally, due to imperfections, the obtained strands can be perturbed by errors. We show that for a large range of parameters, the channel decomposes into sub-channels from each input sequence to multiple output sequences, so-called clusters. The cluster sizes hereby follow a Poisson distribution. Furthermore, the ordering of sub-channels is unknown to the receiver. Our results can be used to guide future experiments for DNA-based data storage by giving an upper bound on the achievable rate of any error-correcting code. We further give a detailed discussion and intuitive interpretation of the channel that provide insights about the nature of the channel and can inspire new ideas for error-correcting codes and decoding methods.

Original languageEnglish
Title of host publication2019 IEEE Information Theory Workshop, ITW 2019
ISBN (Electronic)9781538669006
DOIs
StatePublished - Aug 2019
Event2019 IEEE Information Theory Workshop, ITW 2019 - Visby, Sweden
Duration: 25 Aug 201928 Aug 2019

Publication series

Name2019 IEEE Information Theory Workshop, ITW 2019

Conference

Conference2019 IEEE Information Theory Workshop, ITW 2019
Country/TerritorySweden
CityVisby
Period25/08/1928/08/19

All Science Journal Classification (ASJC) codes

  • Software
  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Information Systems

Fingerprint

Dive into the research topics of 'An Upper Bound on the Capacity of the DNA Storage Channel'. Together they form a unique fingerprint.

Cite this