Abstract
Motivated by DNA-based applications, we generalize the bee identification problem proposed by Tandon et al. (2019). In this setup, we transmit all M codewords from a codebook over some channel and each codeword results in N noisy outputs. Then our task is to identify each codeword from this unordered set of MN noisy outputs. First, via a reduction to a minimum-cost flow problem on a related bipartite flow network called the input-output flow network, we show that the problem can be solved in O(M3) time in the worst case. Next, we consider the deletion and the insertion channels individually, and in both cases, we study the expected number of edges in their respective input-output networks. Specifically, we obtain closed expressions for this quantity for certain codebooks and when the codebook comprises all binary words, we show that this quantity is sub-quadratic when the deletion or insertion probability is less than 1/2. This then implies that the expected running time to perform joint decoding for this codebook is o(M3). For other codebooks, we develop methods to compute the expected number of edges efficiently. Finally, we adapt classical peeling-decoding techniques to reduce the number of nodes and edges in the input-output flow network.
Original language | English |
---|---|
Pages (from-to) | 190-204 |
Number of pages | 15 |
Journal | IEEE Journal on Selected Areas in Information Theory |
Volume | 4 |
DOIs | |
State | Published - 2023 |
Keywords
- Bee identification problem
- deletion channels
- DNA-based data storage
- multidraw channels
ASJC Scopus subject areas
- Computer Networks and Communications
- Media Technology
- Artificial Intelligence
- Applied Mathematics