Scholars@Duke publication: Informative priors based on transcription factor structural class improve de novo motif discovery.

Informative priors based on transcription factor structural class improve de novo motif discovery.

Publication , Journal Article

Narlikar, L; Gordân, R; Ohler, U; Hartemink, AJ

Published in: Bioinformatics

July 15, 2006

MOTIVATION: An important problem in molecular biology is to identify the locations at which a transcription factor (TF) binds to DNA, given a set of DNA sequences believed to be bound by that TF. In previous work, we showed that information in the DNA sequence of a binding site is sufficient to predict the structural class of the TF that binds it. In particular, this suggests that we can predict which locations in any DNA sequence are more likely to be bound by certain classes of TFs than others. Here, we argue that traditional methods for de novo motif finding can be significantly improved by adopting an informative prior probability that a TF binding site occurs at each sequence location. To demonstrate the utility of such an approach, we present priority, a powerful new de novo motif finding algorithm. RESULTS: Using data from TRANSFAC, we train three classifiers to recognize binding sites of basic leucine zipper, forkhead, and basic helix loop helix TFs. These classifiers are used to equip priority with three class-specific priors, in addition to a default prior to handle TFs of other classes. We apply priority and a number of popular motif finding programs to sets of yeast intergenic regions that are reported by ChIP-chip to be bound by particular TFs. priority identifies motifs the other methods fail to identify, and correctly predicts the structural class of the TF recognizing the identified binding sites. AVAILABILITY: Supplementary material and code can be found at http://www.cs.duke.edu/~amink/.

Duke Scholars

Author Raluca Mihaela Gordan Biostatistics & Bioinformatics, Division of Integrative Geno ...

Author Uwe Ohler Biostatistics & Bioinformatics, Division of Biostatistics

Author Alexander J. Hartemink Computer Science

Altmetric Attention Stats

Dimensions Citation Stats

Published In

Bioinformatics

DOI

10.1093/bioinformatics/btl251

EISSN

1367-4811

Publication Date

July 15, 2006

Volume

Issue

Start / End Page

e384 / e392

Location

England

Related Subject Headings

Transcription Factors
Sequence Analysis, DNA
Sequence Alignment
Protein Binding
Molecular Sequence Data
Models, Molecular
Models, Genetic
Models, Chemical
DNA
Computer Simulation

Citation

APA

Chicago

ICMJE

MLA

NLM

Narlikar, L., Gordân, R., Ohler, U., & Hartemink, A. J. (2006). Informative priors based on transcription factor structural class improve de novo motif discovery. Bioinformatics, 22(14), e384–e392. https://doi.org/10.1093/bioinformatics/btl251

Narlikar, Leelavati, Raluca Gordân, Uwe Ohler, and Alexander J. Hartemink. “Informative priors based on transcription factor structural class improve de novo motif discovery.” Bioinformatics 22, no. 14 (July 15, 2006): e384–92. https://doi.org/10.1093/bioinformatics/btl251.

Narlikar L, Gordân R, Ohler U, Hartemink AJ. Informative priors based on transcription factor structural class improve de novo motif discovery. Bioinformatics. 2006 Jul 15;22(14):e384–92.

Narlikar, Leelavati, et al. “Informative priors based on transcription factor structural class improve de novo motif discovery.” Bioinformatics, vol. 22, no. 14, July 2006, pp. e384–92. Pubmed, doi:10.1093/bioinformatics/btl251.

Narlikar L, Gordân R, Ohler U, Hartemink AJ. Informative priors based on transcription factor structural class improve de novo motif discovery. Bioinformatics. 2006 Jul 15;22(14):e384–e392.

Published In

Bioinformatics

DOI

10.1093/bioinformatics/btl251

EISSN

1367-4811

Publication Date

July 15, 2006

Volume

Issue

Start / End Page

e384 / e392

Location

England

Related Subject Headings

Transcription Factors
Sequence Analysis, DNA
Sequence Alignment
Protein Binding
Molecular Sequence Data
Models, Molecular
Models, Genetic
Models, Chemical
DNA
Computer Simulation