R&D: Iterative Soft Decoding Algorithm for DNA Storage Using Quality Score and Redecoding
Proposed soft decoding algorithm gives 2.3%~7.0% improvement of reading number reduction compared to state-of-the-art decoding method.
This is a Press Release edited by StorageNewsletter.com on July 14, 2023 at 2:00 pmArxiv has published an article written by Jaeho Jeong, Department of Electrical and Computer Engineering, Institute of New Media and Communications, (INMC), Seoul National University, Seoul 08826, South Korea, Hosung Park, Department of Computer Engineering and Department of ICT Convergence System Engineering, Chonnam National University, Gwangju 61186, South Korea, Hee-Youl Kwak, chool of Electrical Engineering, University of Ulsan, Ulsan 44610, South Korea, Jong-Seon No, Department of Electrical and Computer Engineering, Institute of New Media and Communications, (INMC), Seoul National University, Seoul 08826, South Korea, Hahyeon Jeon, Clinomics, Ulsan 44919, South Korea, Jeong Wook Lee, Department of Chemical Engineering, POSTECH, Pohang 37673, South Korea, and Jae-Won Kim, Department of Electronic Engineering, Engineering Research Institute (ERI), Gyeongsang National University, Jinju 52828, South Korea.
Abstract: “Ever since deoxyribonucleic acid (DNA) was considered as a next-generation data-storage medium, lots of research efforts have been made to correct errors occurred during the synthesis, storage, and sequencing processes using error correcting codes (ECCs). Previous works on recovering the data from the sequenced DNA pool with errors have utilized hard decoding algorithms based on a majority decision rule. To improve the correction capability of ECCs and robustness of the DNA storage system, we propose a new iterative soft decoding algorithm, where soft information is obtained from FASTQ files and channel statistics. In particular, we propose a new formula for log-likelihood ratio (LLR) calculation using quality scores (Q-scores) and a redecoding method which may be suitable for the error correction and detection in the DNA sequencing area. Based on the widely adopted encoding scheme of the fountain code structure proposed by Erlich et al., we use three different sets of sequenced data to show consistency for the performance evaluation. The proposed soft decoding algorithm gives 2.3% ~ 7.0% improvement of the reading number reduction compared to the state-of-the-art decoding method and it is shown that it can deal with erroneous sequenced oligo reads with insertion and deletion errors.“