Skip to main content

Stability selection for regression-based models of transcription factor-DNA binding specificity.

Publication ,  Journal Article
Mordelet, F; Horton, J; Hartemink, AJ; Engelhardt, BE; Gordân, R
Published in: Bioinformatics
July 1, 2013

MOTIVATION: The DNA binding specificity of a transcription factor (TF) is typically represented using a position weight matrix model, which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret. RESULTS: We propose novel regression-based models of TF-DNA binding specificity, trained using high resolution in vitro data from custom protein-binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human TFs c-Myc, Max and Mad2) in their native genomic context. These high-throughput quantitative data are well suited for training complex models that take into account not only independent contributions from individual bases, but also contributions from di- and trinucleotides at various positions within or near the binding sites. To ensure that our models remain interpretable, we use feature selection to identify a small number of sequence features that accurately predict TF-DNA binding specificity. To further illustrate the accuracy of our regression models, we show that even in the case of paralogous TF with highly similar position weight matrices, our new models can distinguish the specificities of individual factors. Thus, our work represents an important step toward better sequence-based models of individual TF-DNA binding specificity. AVAILABILITY: Our code is available at http://genome.duke.edu/labs/gordan/ISMB2013. The PBM data used in this article are available in the Gene Expression Omnibus under accession number GSE47026.

Duke Scholars

Published In

Bioinformatics

DOI

EISSN

1367-4811

Publication Date

July 1, 2013

Volume

29

Issue

13

Start / End Page

i117 / i125

Location

England

Related Subject Headings

  • Transcription Factors
  • Support Vector Machine
  • Saccharomyces cerevisiae Proteins
  • Protein Binding
  • Protein Array Analysis
  • Linear Models
  • Humans
  • Genome
  • DNA
  • Bioinformatics
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Mordelet, F., Horton, J., Hartemink, A. J., Engelhardt, B. E., & Gordân, R. (2013). Stability selection for regression-based models of transcription factor-DNA binding specificity. Bioinformatics, 29(13), i117–i125. https://doi.org/10.1093/bioinformatics/btt221
Mordelet, Fantine, John Horton, Alexander J. Hartemink, Barbara E. Engelhardt, and Raluca Gordân. “Stability selection for regression-based models of transcription factor-DNA binding specificity.Bioinformatics 29, no. 13 (July 1, 2013): i117–25. https://doi.org/10.1093/bioinformatics/btt221.
Mordelet F, Horton J, Hartemink AJ, Engelhardt BE, Gordân R. Stability selection for regression-based models of transcription factor-DNA binding specificity. Bioinformatics. 2013 Jul 1;29(13):i117–25.
Mordelet, Fantine, et al. “Stability selection for regression-based models of transcription factor-DNA binding specificity.Bioinformatics, vol. 29, no. 13, July 2013, pp. i117–25. Pubmed, doi:10.1093/bioinformatics/btt221.
Mordelet F, Horton J, Hartemink AJ, Engelhardt BE, Gordân R. Stability selection for regression-based models of transcription factor-DNA binding specificity. Bioinformatics. 2013 Jul 1;29(13):i117–i125.

Published In

Bioinformatics

DOI

EISSN

1367-4811

Publication Date

July 1, 2013

Volume

29

Issue

13

Start / End Page

i117 / i125

Location

England

Related Subject Headings

  • Transcription Factors
  • Support Vector Machine
  • Saccharomyces cerevisiae Proteins
  • Protein Binding
  • Protein Array Analysis
  • Linear Models
  • Humans
  • Genome
  • DNA
  • Bioinformatics