Skip to main content

Compact encoding strategies for DNA sequence similarity search.

Publication ,  Conference
States, DJ; Agarwal, P
Published in: Proceedings. International Conference on Intelligent Systems for Molecular Biology
January 1996

Determining whether two DNA sequences are similar is an essential component of DNA sequence analysis. Dynamic programming is the algorithm of choice if computational time is not the most important consideration. Heuristic search tools, such as BLAST, are computationally more efficient, but they may miss some of the sequence similarities (Altschul et al., 1990). These tools often use common k-tuples (words) between the two sequences to determine anchor points for the alignment, and spend most of their computational time extending the alignment beyond these anchor points. We discuss and provide a DNA sequence similarity search implementation (called SENSEI) that improves upon the performance of BLASTN by almost an order of magnitude for comparable sensitivity. This improvement is a result of using compactly encoded scoring tables for k-tuples, encoding bases with a single bit, filtering the sequence to remove the simple sequence repeats using XNUN, and masking the known species-specific repeats in the query sequence. To reduce memory requirements, especially for large genomic DNA query sequences, we recommend generating the neighborhood words from the target sequence at run-time, instead of generating them by preprocessing the query sequence.

Duke Scholars

Published In

Proceedings. International Conference on Intelligent Systems for Molecular Biology

ISSN

1553-0833

Publication Date

January 1996

Volume

4

Start / End Page

211 / 217

Related Subject Headings

  • Software
  • Sequence Homology, Nucleic Acid
  • Sequence Analysis, DNA
  • Repetitive Sequences, Nucleic Acid
  • Molecular Sequence Data
  • Humans
  • Glucosephosphate Dehydrogenase
  • Gene Library
  • Databases, Factual
  • Base Sequence
 

Citation

APA
Chicago
ICMJE
MLA
NLM
States, D. J., & Agarwal, P. (1996). Compact encoding strategies for DNA sequence similarity search. In Proceedings. International Conference on Intelligent Systems for Molecular Biology (Vol. 4, pp. 211–217).
States, D. J., and P. Agarwal. “Compact encoding strategies for DNA sequence similarity search.” In Proceedings. International Conference on Intelligent Systems for Molecular Biology, 4:211–17, 1996.
States DJ, Agarwal P. Compact encoding strategies for DNA sequence similarity search. In: Proceedings International Conference on Intelligent Systems for Molecular Biology. 1996. p. 211–7.
States, D. J., and P. Agarwal. “Compact encoding strategies for DNA sequence similarity search.Proceedings. International Conference on Intelligent Systems for Molecular Biology, vol. 4, 1996, pp. 211–17.
States DJ, Agarwal P. Compact encoding strategies for DNA sequence similarity search. Proceedings International Conference on Intelligent Systems for Molecular Biology. 1996. p. 211–217.

Published In

Proceedings. International Conference on Intelligent Systems for Molecular Biology

ISSN

1553-0833

Publication Date

January 1996

Volume

4

Start / End Page

211 / 217

Related Subject Headings

  • Software
  • Sequence Homology, Nucleic Acid
  • Sequence Analysis, DNA
  • Repetitive Sequences, Nucleic Acid
  • Molecular Sequence Data
  • Humans
  • Glucosephosphate Dehydrogenase
  • Gene Library
  • Databases, Factual
  • Base Sequence