Skip to main content
Journal cover image

A comparative analysis of the information content in long and short SAGE libraries.

Publication ,  Journal Article
Li, Y-J; Xu, P; Qin, X; Schmechel, DE; Hulette, CM; Haines, JL; Pericak-Vance, MA; Gilbert, JR
Published in: BMC Bioinformatics
November 16, 2006

BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a powerful tool to determine gene expression profiles. Two types of SAGE libraries, ShortSAGE and LongSAGE, are classified based on the length of the SAGE tag (10 vs. 17 basepairs). LongSAGE libraries are thought to be more useful than ShortSAGE libraries, but their information content has not been widely compared. To dissect the differences between these two types of libraries, we utilized four libraries (two LongSAGE and two ShortSAGE libraries) generated from the hippocampus of Alzheimer and control samples. In addition, we generated two additional short SAGE libraries, the truncated long SAGE libraries (tSAGE), from LongSAGE libraries by deleting seven 5' basepairs from each LongSAGE tag. RESULTS: One problem that occurred in the SAGE study is that individual tags may have matched to multiple different genes - due to the short length of a tag. We found that the LongSAGE tag maps up to 15 UniGene clusters, while the ShortSAGE and tSAGE tags map up to 279 UniGene clusters. Both long and short SAGE libraries exhibit a large number of orphan tags (no gene information in UniGene), implying the limitation of the UniGene database. Among 100 orphan LongSAGE tags, the complete sequences (17 basepairs) of nine orphan tags match to 17 genomic sequences; four of the orphan tags match to a single genomic sequence. Our data show the potential to resolve 4-9% of orphan LongSAGE tags. Finally, among 400 tSAGE tags showing significant differential expression between AD and control, 79 tags (19.8%) were derived from multiple non-significant LongSAGE tags, implying the false positive results. CONCLUSION: Our data show that LongSAGE tags have high specificity in gene mapping compared to ShortSAGE tags. LongSAGE tags show an advantage over ShortSAGE in identifying novel genes by BLAST analysis. Most importantly, the chances of obtaining false positive results are higher for ShortSAGE than LongSAGE libraries due to their specificity in gene mapping. Therefore, it is recommended that the number of corresponding UniGene clusters (gene or ESTs) of a tag for prioritizing the significant results be considered.

Duke Scholars

Published In

BMC Bioinformatics

DOI

EISSN

1471-2105

Publication Date

November 16, 2006

Volume

7

Start / End Page

504

Location

England

Related Subject Headings

  • RNA
  • Multigene Family
  • Models, Statistical
  • Humans
  • Hippocampus
  • Gene Library
  • Gene Expression Profiling
  • Gene Expression
  • False Positive Reactions
  • Databases, Genetic
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Li, Y.-J., Xu, P., Qin, X., Schmechel, D. E., Hulette, C. M., Haines, J. L., … Gilbert, J. R. (2006). A comparative analysis of the information content in long and short SAGE libraries. BMC Bioinformatics, 7, 504. https://doi.org/10.1186/1471-2105-7-504
Li, Yi-Ju, Puting Xu, Xuejun Qin, Donald E. Schmechel, Christine M. Hulette, Jonathan L. Haines, Margaret A. Pericak-Vance, and John R. Gilbert. “A comparative analysis of the information content in long and short SAGE libraries.BMC Bioinformatics 7 (November 16, 2006): 504. https://doi.org/10.1186/1471-2105-7-504.
Li Y-J, Xu P, Qin X, Schmechel DE, Hulette CM, Haines JL, et al. A comparative analysis of the information content in long and short SAGE libraries. BMC Bioinformatics. 2006 Nov 16;7:504.
Li, Yi-Ju, et al. “A comparative analysis of the information content in long and short SAGE libraries.BMC Bioinformatics, vol. 7, Nov. 2006, p. 504. Pubmed, doi:10.1186/1471-2105-7-504.
Li Y-J, Xu P, Qin X, Schmechel DE, Hulette CM, Haines JL, Pericak-Vance MA, Gilbert JR. A comparative analysis of the information content in long and short SAGE libraries. BMC Bioinformatics. 2006 Nov 16;7:504.
Journal cover image

Published In

BMC Bioinformatics

DOI

EISSN

1471-2105

Publication Date

November 16, 2006

Volume

7

Start / End Page

504

Location

England

Related Subject Headings

  • RNA
  • Multigene Family
  • Models, Statistical
  • Humans
  • Hippocampus
  • Gene Library
  • Gene Expression Profiling
  • Gene Expression
  • False Positive Reactions
  • Databases, Genetic