Multidimensional data organization and random access in large-scale DNA storage systems

Journal Article (Journal Article)

With impressive physical density and molecular-scale coding capacity, DNA is a promising substrate for building long-lasting data archival storage systems. To retrieve data from DNA storage, recent implementations typically rely on large libraries of meticulously designed orthogonal PCR primers, which fundamentally limit the capacity and scalability of practical DNA storage. This work combines nested and semi-nested PCR to enable multidimensional data organization and random access in large DNA storage. Our strategy effectively pushes the limit of DNA storage capacity and dramatically reduces the number of orthogonal primers needed for efficient PCR random access. Our design uses only k⁎n primers to uniquely address nk data-encoding oligos. The architecture inherently supports various well-defined PCR random-access patterns that can be tailored to organize and preserve the underlying DNA-encoded data structures and relations in simple database-like formats such as rows, columns, tables, and blocks of data entries. We design in silico PCR experiments of a four-dimensional DNA storage to illustrate the mechanisms of sixteen different random-access patterns each requiring no more than two PCR reactions to selectively amplify a target dataset of various sizes. To better approximate the physical system, we formulate mathematical models based on empirical distributions to analyze the effect of pipetting, PCR bias, and PCR stochasticity on the performance of multidimensional data queries from large DNA storage.

Full Text

Duke Authors

Cited Authors

  • Song, X; Shah, S; Reif, J

Published Date

  • November 26, 2021

Published In

Volume / Issue

  • 894 /

Start / End Page

  • 190 - 202

International Standard Serial Number (ISSN)

  • 0304-3975

Digital Object Identifier (DOI)

  • 10.1016/j.tcs.2021.09.021

Citation Source

  • Scopus