A comparative analysis of the information content in long and short SAGE libraries.
BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a powerful tool to determine gene expression profiles. Two types of SAGE libraries, ShortSAGE and LongSAGE, are classified based on the length of the SAGE tag (10 vs. 17 basepairs). LongSAGE libraries are thought to be more useful than ShortSAGE libraries, but their information content has not been widely compared. To dissect the differences between these two types of libraries, we utilized four libraries (two LongSAGE and two ShortSAGE libraries) generated from the hippocampus of Alzheimer and control samples. In addition, we generated two additional short SAGE libraries, the truncated long SAGE libraries (tSAGE), from LongSAGE libraries by deleting seven 5' basepairs from each LongSAGE tag. RESULTS: One problem that occurred in the SAGE study is that individual tags may have matched to multiple different genes - due to the short length of a tag. We found that the LongSAGE tag maps up to 15 UniGene clusters, while the ShortSAGE and tSAGE tags map up to 279 UniGene clusters. Both long and short SAGE libraries exhibit a large number of orphan tags (no gene information in UniGene), implying the limitation of the UniGene database. Among 100 orphan LongSAGE tags, the complete sequences (17 basepairs) of nine orphan tags match to 17 genomic sequences; four of the orphan tags match to a single genomic sequence. Our data show the potential to resolve 4-9% of orphan LongSAGE tags. Finally, among 400 tSAGE tags showing significant differential expression between AD and control, 79 tags (19.8%) were derived from multiple non-significant LongSAGE tags, implying the false positive results. CONCLUSION: Our data show that LongSAGE tags have high specificity in gene mapping compared to ShortSAGE tags. LongSAGE tags show an advantage over ShortSAGE in identifying novel genes by BLAST analysis. Most importantly, the chances of obtaining false positive results are higher for ShortSAGE than LongSAGE libraries due to their specificity in gene mapping. Therefore, it is recommended that the number of corresponding UniGene clusters (gene or ESTs) of a tag for prioritizing the significant results be considered.
Li, Y-J; Xu, P; Qin, X; Schmechel, DE; Hulette, CM; Haines, JL; Pericak-Vance, MA; Gilbert, JR
Volume / Issue
Start / End Page
Pubmed Central ID
Electronic International Standard Serial Number (EISSN)
Digital Object Identifier (DOI)