Reconstruction of strings from their substrings spectrum

Sagi Marcovich, Eitan Yaakobi

Research output: Contribution to journalArticlepeer-review

Abstract

This paper studies reconstruction of strings based upon their substrings spectrum. Under this paradigm, it is assumed that all substrings of some fixed length are received and the goal is to reconstruct the string. While many existing works assumed that substrings are received error free, we follow in this paper the noisy setup of this problem that was first studied by Gabrys and Milenkovic. The goal of this study is twofold. First we study the setup in which not all substrings in the multispectrum are received, and then we focus on the case where the read substrings are not error free. In each case we provide specific code constructions of strings that their reconstruction is guaranteed even in the presence of failure in either model. We present efficient encoding and decoding maps and analyze the cardinality of the code constructions, while studying the cases where the rates of our codes approach 1.

Original languageEnglish
Article number9443207
Pages (from-to)4369-4384
Number of pages16
JournalIEEE Transactions on Information Theory
Volume67
Issue number7
DOIs
StatePublished - Jul 2021

Keywords

  • DNA
  • DNA sequencing
  • Decoding
  • Encoding
  • Hamming distance
  • Noise measurement
  • Reconstruction of sequences
  • Sequential analysis
  • Tools
  • substring-distant strings
  • substring-unique strings

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Reconstruction of strings from their substrings spectrum'. Together they form a unique fingerprint.

Cite this