Skip to main content

A transcription factor affinity-based code for mammalian transcription initiation.

Publication ,  Journal Article
Megraw, M; Pereira, F; Jensen, ST; Ohler, U; Hatzigeorgiou, AG
Published in: Genome Res
April 2009

The recent arrival of large-scale cap analysis of gene expression (CAGE) data sets in mammals provides a wealth of quantitative information on coding and noncoding RNA polymerase II transcription start sites (TSS). Genome-wide CAGE studies reveal that a large fraction of TSS exhibit peaks where the vast majority of associated tags map to a particular location ( approximately 45%), whereas other active regions contain a broader distribution of initiation events. The presence of a strong single peak suggests that transcription at these locations may be mediated by position-specific sequence features. We therefore propose a new model for single-peaked TSS based solely on known transcription factors (TFs) and their respective regions of positional enrichment. This probabilistic model leads to near-perfect classification results in cross-validation (auROC = 0.98), and performance in genomic scans demonstrates that TSS prediction with both high accuracy and spatial resolution is achievable for a specific but large subgroup of mammalian promoters. The interpretable model structure suggests a DNA code in which canonical sequence features such as TATA-box, Initiator, and GC content do play a significant role, but many additional TFs show distinct spatial biases with respect to TSS location and are important contributors to the accurate prediction of single-peak transcription initiation sites. The model structure also reveals that CAGE tag clusters distal from annotated gene starts have distinct characteristics compared to those close to gene 5'-ends. Using this high-resolution single-peak model, we predict TSS for approximately 70% of mammalian microRNAs based on currently available data.

Duke Scholars

Published In

Genome Res

DOI

ISSN

1088-9051

Publication Date

April 2009

Volume

19

Issue

4

Start / End Page

644 / 656

Location

United States

Related Subject Headings

  • Transcription, Genetic
  • Transcription Initiation Site
  • Transcription Factors
  • TATA Box
  • RNA Polymerase II
  • Promoter Regions, Genetic
  • Humans
  • Genome, Human
  • Gene Expression Regulation
  • Databases, Genetic
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Megraw, M., Pereira, F., Jensen, S. T., Ohler, U., & Hatzigeorgiou, A. G. (2009). A transcription factor affinity-based code for mammalian transcription initiation. Genome Res, 19(4), 644–656. https://doi.org/10.1101/gr.085449.108
Megraw, Molly, Fernando Pereira, Shane T. Jensen, Uwe Ohler, and Artemis G. Hatzigeorgiou. “A transcription factor affinity-based code for mammalian transcription initiation.Genome Res 19, no. 4 (April 2009): 644–56. https://doi.org/10.1101/gr.085449.108.
Megraw M, Pereira F, Jensen ST, Ohler U, Hatzigeorgiou AG. A transcription factor affinity-based code for mammalian transcription initiation. Genome Res. 2009 Apr;19(4):644–56.
Megraw, Molly, et al. “A transcription factor affinity-based code for mammalian transcription initiation.Genome Res, vol. 19, no. 4, Apr. 2009, pp. 644–56. Pubmed, doi:10.1101/gr.085449.108.
Megraw M, Pereira F, Jensen ST, Ohler U, Hatzigeorgiou AG. A transcription factor affinity-based code for mammalian transcription initiation. Genome Res. 2009 Apr;19(4):644–656.

Published In

Genome Res

DOI

ISSN

1088-9051

Publication Date

April 2009

Volume

19

Issue

4

Start / End Page

644 / 656

Location

United States

Related Subject Headings

  • Transcription, Genetic
  • Transcription Initiation Site
  • Transcription Factors
  • TATA Box
  • RNA Polymerase II
  • Promoter Regions, Genetic
  • Humans
  • Genome, Human
  • Gene Expression Regulation
  • Databases, Genetic