Sequence Design and Reconstruction Under the Repeat Channel in Enzymatic DNA Synthesis

Roy Shafir, Omer Sabary, Leon Anavy, Eitan Yaakobi, Zohar Yakhini

Research output: Contribution to journalArticlepeer-review

Abstract

Using synthetic DNA for data storage and for physical information encoding in labeling, tracing, and authentication applications is becoming more feasible as synthesis and reading technologies are improving. DNA in data storage applications has several advantages such as very high physical density and robustness. Some of the new synthesis technologies lead to repetition noise, consisting of sticky insertions and deletions in the resulting messages. In this paper, we address reconstruction algorithms for multiple trace communication channels with repetition (sticky insertion and deletion) noise. We prove correctness and analyze failure rates, both analytically and on simulated data. We identify a failure mechanism related to alternating stretches in the design sequence that leads to a potential bias in the data derived from reads (traces) and used for reconstruction. To minimize this effect we introduce alternating length limited codes (ALL codes) and analyze some of their properties.

Original languageEnglish
Pages (from-to)675-691
Number of pages17
JournalIEEE Transactions on Communications
Volume72
Issue number2
DOIs
StatePublished - 1 Feb 2024

Keywords

  • DNA-based data storage
  • DNA-based informatics
  • alternating-length limited codes
  • deletion channel
  • enzymatic DNA synthesis
  • insertion channel
  • nucleic acid sequencing
  • sequence reconstruction

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Sequence Design and Reconstruction Under the Repeat Channel in Enzymatic DNA Synthesis'. Together they form a unique fingerprint.

Cite this