New Approach to DNA Storage Makes System More Dynamic and Scalable, by Researchers From North Carolina State University

From North Carolina State University

Researchers from North Carolina State University have developed a new approach to DNA storage systems, giving users the ability to read or modify data files without destroying them and making the systems easier to scale up for practical use.

(Image credit: Kevin Lin)

Nc Unv Keung Dna Data Storage 2020 Header

“Most of the existing DNA data storage systems rely on polymerase chain reaction (PCR) to access stored files, which is very efficient at copying information but presents some significant challenges,” says Albert Keung, co-corresponding author of a paper on the work and an assistant professor of chemical and biomolecular engineering at NC State. “We’ve developed a system called Dynamic Operations and Reusable Information Storage, or DORIS, that doesn’t rely on PCR. That has helped us address some of the key obstacles facing practical implementation of DNA data storage technologies.”

DNA storage systems have the potential to hold orders of magnitude more information than existing systems of comparable size. However, existing technologies have struggled to address a range of concerns related to practical implementation.

Current systems rely on sequences of DNA called primer-binding sequences that are added to the ends of DNA strands that store information. In short, the primer-binding sequence of DNA serves as a file name. When you want a given file, you retrieve the strands of DNA bearing that sequence.

Many of the practical barriers to DNA storage technologies revolve around the use of PCR to retrieve stored data. Systems that rely on PCR have to drastically raise and lower the temperature of the stored genetic material in order to rip the double-stranded DNA apart and reveal the primer-binding sequence. This results in all of the DNA – the primer-binding sequences and the data-storage sequences – swimming free in a kind of genetic soup. Existing technologies can then sort through the soup to find, retrieve and copy the relevant DNA using PCR. The temperature swings are problematic for developing practical technologies, and the PCR technique itself gradually consumes – or uses up – the original version of the file that is being retrieved.

DORIS takes a different approach. Instead of using double-stranded DNA as a primer-binding sequence, it uses an ‘overhang’ that consists of a single-strand of DNA – like a tail that streams behind the double-stranded DNA that actually stores data. While traditional techniques required temperature fluctuations to rip open the DNA in order to find the relevant primer-binding sequences, using a single-stranded overhang means that DORIS can find the appropriate primer-binding sequences without disturbing the double-stranded DNA.

“In other words, DORIS can work at room temperature, making it much more feasible to develop DNA data management technologies that are viable in real-world scenarios,” says James Tuck, co-corresponding author of the paper and professor, electrical and computer engineering, NC State.

The other benefit of not having to rip apart the DNA strands is that the DNA sequence in the overhang can be the same as a sequence found in the double-stranded region of the data file itself. That’s difficult to achieve in PCR-based systems without sacrificing information density – because the system wouldn’t be able to differentiate between primer-binding sequences and data-storage sequences.

“DORIS allows us to significantly increase the information density of the system, and also makes it easier to scale up to handle really large databases,” says Kevin Lin, first author of the paper and Ph.D. Student, NC State.

And once DORIS has identified the correct DNA sequence, it doesn’t rely on PCR to make copies. Instead, it transcribes the DNA to RNA, which is then reverse-transcribed back into DNA which the data-storage system can read. In other words, it doesn’t have to consume the original file in order to read it.

The single-stranded overhangs can also be modified, allowing users to rename files, delete files or ‘lock’ them – effectively making them invisible to other users.

“We’ve developed a functional prototype of DORIS, so we know it works,” Keung says. “We’re now interested in scaling it up, speeding it up and putting it into a device that automates the process – making it user friendly.”

The paper, Dynamic and scalable DNA-based information storage, is published in the journal Nature Communications. The paper was co-authored by Kevin Volkel, Ph.D. Student, NC State.

The work was done with support from the National Science Foundation, under grants CNS-1650148 and CNS-1901324; a North Carolina State University Research and Innovation Seed Funding Award; a North Carolina Biotechnology Center Flash Grant; and a Department of Education Graduate Assistance in Areas of Need fellowship.

Article: Dynamic and scalable DNA-based information storage

Nature Communications has published an article written by Kevin N. Lin, Department of Chemical and Biomolecular Engineering, North Carolina State University, Campus Box 7905, Raleigh, NC, 27695-7905, USA, Kevin Volkel, James M. Tuck, Department of Electrical and Computer Engineering, North Carolina State University, Campus Box 7911, Raleigh, NC, 27695-7911, USA, and Albert J. Keung, Department of Chemical and Biomolecular Engineering, North Carolina State University, Campus Box 7905, Raleigh, NC, 27695-7905, USA.

Molecular technologies unlock dynamic operations for DNA storage.
The generic framework for DNA-based storage systems includes encoding of digital information to nucleotide sequences, DNA synthesis and storage, DNA sequencing, and decoding the desired information. b Schematic of challenges faced by PCR-based file access. c Schematic of DORIS (Dynamic Operations and Reusable Information Storage). ss-dsDNA strands enable repeatable information access through non-PCR-based magnetic separation, in vitro transcription, reverse transcription, and the return of separated files to the database. Additionally, the overhangs of ss-dsDNAs enable in-storage file operations including lock, unlock, rename, and delete.
Click to enlarge

Abstract: “The physical architectures of information storage systems often dictate how information is encoded, databases are organized, and files are accessed. Here we show that a simple architecture comprised of a T7 promoter and a single-stranded overhang domain (ss-dsDNA), can unlock dynamic DNA-based information storage with powerful capabilities and advantages. The overhang provides a physical address for accessing specific DNA strands as well as implementing a range of in-storage file operations. It increases theoretical storage densities and capacities by expanding the encodable sequence space and simplifies the computational burden in designing sets of orthogonal file addresses. Meanwhile, the T7 promoter enables repeatable information access by transcribing information from DNA without destroying it. Furthermore, saturation mutagenesis around the T7 promoter and systematic analyses of environmental conditions reveal design criteria that can be used to optimize information access. This simple but powerful ss-dsDNA architecture lays the foundation for information storage with versatile capabilities.“