Generalized Unique Reconstruction From Substrings

Yonatan Yehezkeally, Daniella Bar-Lev, Sagi Marcovich, Eitan Yaakobi

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which all substrings of pre-defined lengths are read or substrings are read with no overlap for the single string case, this work studies two extensions of this paradigm. The first extension considers the setup in which consecutive substrings are read with some given minimum overlap. First, an upper bound is provided on the attainable rates of codes that guarantee unique reconstruction. Then, efficient constructions of codes that asymptotically meet that upper bound are presented. In the second extension, we study the setup where multiple strings are reconstructed together. Given the number of strings and their length, we first derive a lower bound on the read substrings' length ℓ that is necessary for the existence of multi-strand reconstruction codes with non-vanishing rates. We then present two constructions of such codes and show that their rates approach 1 for values of ℓ that asymptotically behave like the lower bound.

Original languageEnglish
Pages (from-to)5648-5659
Number of pages12
JournalIEEE Transactions on Information Theory
Volume69
Issue number9
DOIs
StatePublished - 1 Sep 2023

Keywords

  • DNA sequences
  • Sequence reconstruction
  • error correction codes
  • worst-case analysis

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Generalized Unique Reconstruction From Substrings'. Together they form a unique fingerprint.

Cite this