Building pathway clusters from Random Forests classification using class votes.

Published online

Journal Article

BACKGROUND: Recent years have seen the development of various pathway-based methods for the analysis of microarray gene expression data. These approaches have the potential to bring biological insights into microarray studies. A variety of methods have been proposed to construct networks using gene expression data. Because individual pathways do not act in isolation, it is important to understand how different pathways coordinate to perform cellular functions. However, there are no published methods describing how to build pathway clusters that are closely related to traits of interest. RESULTS: We propose to build pathway clusters from pathway-based classification methods. The proposed methods allow researchers to identify clusters of pathways sharing similar functions. These pathways may or may not share genes. As an illustration, our approach is applied to three human breast cancer microarray data sets. We found that our methods yielded consistent and interpretable results for these three data sets. We further investigated one of the pathway clusters found using PubMatrix. We found that informative genes in the pathway clusters do have more publications with keywords, like estrogen receptor, compared with informative genes in other top pathways. In addition, using the shortest path analysis in GeneGo's MetaCore and Human Protein Reference Database, we were able to identify the links which connect the pathways without shared genes within the pathway cluster. CONCLUSION: Our proposed pathway clustering methods allow bioinformaticians and biologists to investigate how informative genes within pathways are related to each other and understand possible crosstalk between pathways in a cluster. Therefore, building pathway clusters may lead to a better understanding of molecular mechanisms affecting a trait of interest, and help generate further biological hypotheses from gene expression data.

Full Text

Duke Authors

Cited Authors

  • Pang, H; Zhao, H

Published Date

  • February 6, 2008

Published In

Volume / Issue

  • 9 /

Start / End Page

  • 87 -

PubMed ID

  • 18254968

Pubmed Central ID

  • 18254968

Electronic International Standard Serial Number (EISSN)

  • 1471-2105

Digital Object Identifier (DOI)

  • 10.1186/1471-2105-9-87

Language

  • eng

Conference Location

  • England