nuID: a universal naming scheme of oligonucleotides for illumina, affymetrix, and other microarrays.

Published

Journal Article

BACKGROUND: Oligonucleotide probes that are sequence identical may have different identifiers between manufacturers and even between different versions of the same company's microarray; and sometimes the same identifier is reused and represents a completely different oligonucleotide, resulting in ambiguity and potentially mis-identification of the genes hybridizing to that probe. RESULTS: We have devised a unique, non-degenerate encoding scheme that can be used as a universal representation to identify an oligonucleotide across manufacturers. We have named the encoded representation 'nuID', for nucleotide universal identifier. Inspired by the fact that the raw sequence of the oligonucleotide is the true definition of identity for a probe, the encoding algorithm uniquely and non-degenerately transforms the sequence itself into a compact identifier (a lossless compression). In addition, we added a redundancy check (checksum) to validate the integrity of the identifier. These two steps, encoding plus checksum, result in an nuID, which is a unique, non-degenerate, permanent, robust and efficient representation of the probe sequence. For commercial applications that require the sequence identity to be confidential, we have an encryption schema for nuID. We demonstrate the utility of nuIDs for the annotation of Illumina microarrays, and we believe it has universal applicability as a source-independent naming convention for oligomers. REVIEWERS: This article was reviewed by Itai Yanai, Rong Chen (nominated by Mark Gerstein), and Gregory Schuler (nominated by David Lipman).

Full Text

Duke Authors

Cited Authors

  • Du, P; Kibbe, WA; Lin, SM

Published Date

  • January 1, 2007

Published In

Volume / Issue

  • 2 /

Start / End Page

  • 16 -

PubMed ID

  • 17540033

Pubmed Central ID

  • 17540033

Electronic International Standard Serial Number (EISSN)

  • 1745-6150

International Standard Serial Number (ISSN)

  • 1745-6150

Digital Object Identifier (DOI)

  • 10.1186/1745-6150-2-16

Language

  • eng