Mining phenotypes and informative genes from gene expression data
Mining microarray gene expression data is an important research topic in bioinformatics with broad applications. While most of the previous studies focus on clustering either genes or samples, it is interesting to ask whether we can partition the complete set of samples into exclusive groups (called phenotypes) and find a set of informative genes that can manifest the phenotype structure. In this paper, we propose a new problem of simultaneously mining phenotypes and informative genes from gene expression data. Some statistics-based metrics are proposed to measure the quality of the mining results. Two interesting algorithms are developed: the heuristic search and the mutual reinforcing adjustment method. We present an extensive performance study on both real-world data sets and synthetic data sets. The mining results from the two proposed methods are clearly better than those from the previous methods. They are ready for the real-world applications. Between the two methods, the mutual reinforcing adjustment method is in general more scalable, more effective and with better quality of the mining results. Copyright 2003 ACM.