TY - GEN
T1 - Correcting a Single Deletion in Reads from a Nanopore Sequencer
AU - Banerjee, Anisha
AU - Yehezkeally, Yonatan
AU - Wachter-Zeh, Antonia
AU - Yaakobi, Eitan
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Owing to its several merits over other DNA sequencing technologies, nanopore sequencers hold an immense potential to revolutionize the efficiency of DNA storage systems. However, their higher error rates necessitate further research to devise practical and efficient coding schemes that would allow accurate retrieval of the data stored. Our work takes a step in this direction by adopting a simplified model of the nanopore sequencer inspired by Mao et al., which incorporates some of its physical aspects. This channel model can be viewed as a sliding window of length ℓ that passes over the incoming input sequence and produces the Hamming weight of the enclosed ℓ bits, while shifting by one position at each time step. The resulting (ℓ + 1)-ary vector, referred to as the ℓ-read vector, is susceptible to deletion errors due to imperfections inherent in the sequencing process. We establish that at least log n-ℓ bits of redundancy are needed to correct a single deletion. An error-correcting code that is optimal up to an additive constant, is also proposed. Furthermore, we find that for ℓ ≥ 2, reconstruction from two distinct noisy ℓ-read vectors can be accomplished without any redundancy, and provide a suitable reconstruction algorithm to this effect.
AB - Owing to its several merits over other DNA sequencing technologies, nanopore sequencers hold an immense potential to revolutionize the efficiency of DNA storage systems. However, their higher error rates necessitate further research to devise practical and efficient coding schemes that would allow accurate retrieval of the data stored. Our work takes a step in this direction by adopting a simplified model of the nanopore sequencer inspired by Mao et al., which incorporates some of its physical aspects. This channel model can be viewed as a sliding window of length ℓ that passes over the incoming input sequence and produces the Hamming weight of the enclosed ℓ bits, while shifting by one position at each time step. The resulting (ℓ + 1)-ary vector, referred to as the ℓ-read vector, is susceptible to deletion errors due to imperfections inherent in the sequencing process. We establish that at least log n-ℓ bits of redundancy are needed to correct a single deletion. An error-correcting code that is optimal up to an additive constant, is also proposed. Furthermore, we find that for ℓ ≥ 2, reconstruction from two distinct noisy ℓ-read vectors can be accomplished without any redundancy, and provide a suitable reconstruction algorithm to this effect.
UR - http://www.scopus.com/inward/record.url?scp=85202901099&partnerID=8YFLogxK
U2 - 10.1109/ISIT57864.2024.10619468
DO - 10.1109/ISIT57864.2024.10619468
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85202901099
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 103
EP - 108
BT - 2024 IEEE International Symposium on Information Theory, ISIT 2024 - Proceedings
T2 - 2024 IEEE International Symposium on Information Theory, ISIT 2024
Y2 - 7 July 2024 through 12 July 2024
ER -