Sequence features of DNA binding sites reveal structural class of associated transcription factor.
A key goal in molecular biology is to understand the mechanisms by which a cell regulates the transcription of its genes. One important aspect of this transcriptional regulation is the binding of transcription factors (TFs) to their specific cis-regulatory counterparts on the DNA. TFs recognize and bind their DNA counterparts according to the structure of their DNA-binding domains (e.g. zinc finger, leucine zipper, homeodomain). The structure of these domains can be used as a basis for grouping TFs into classes. Although the structure of DNA-binding domains varies widely across TFs generally, the TFs within a particular class bind to DNA in a similar fashion, suggesting the existence of class-specific features in the DNA sequences bound by each class of TFs.In this paper, we apply a sparse Bayesian learning algorithm to identify a small set of class-specific features in the DNA sequences bound by different classes of TFs; the algorithm simultaneously learns a true multi-class classifier that uses these features to predict the DNA-binding domain of the TF that recognizes a particular set of DNA sequences. We train our algorithm on the six largest classes in TRANSFAC, comprising a total of 587 TFs. We learn a six-class classifier for this training set that achieves 87% leave-one-out cross-validation accuracy. We also identify features within cis-regulatory sequences that are highly specific to each class of TF, which has significant implications for how TF binding sites should be modeled for the purpose of motif discovery.
Duke Scholars
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Transcription Factors
- Structure-Activity Relationship
- Sequence Homology, Nucleic Acid
- Sequence Analysis, DNA
- Sequence Alignment
- Protein Binding
- Pattern Recognition, Automated
- Molecular Sequence Data
- DNA
- Conserved Sequence
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Transcription Factors
- Structure-Activity Relationship
- Sequence Homology, Nucleic Acid
- Sequence Analysis, DNA
- Sequence Alignment
- Protein Binding
- Pattern Recognition, Automated
- Molecular Sequence Data
- DNA
- Conserved Sequence