Statistical properties on semiparametric regression for evaluating pathway effects
Most statistical methods for microarray data analysis consider one gene at a time, and they may miss subtle changes at the single gene level. This limitation may be overcome by considering a set of genes simultaneously where the gene sets are derived from prior biological knowledge. We call a pathway as a predefined set of genes that serve a particular cellular or physiological function. Limited work has been done in the regression settings to study the effects of clinical covariates and expression levels of genes in a pathway on a continuous clinical outcome. A semiparametric regression approach for identifying pathways related to a continuous outcome was proposed by Liu et al. (2007), who demonstrated the connection between a least squares kernel machine for nonparametric pathway effect and a restricted maximum likelihood (REML) for variance components. However, the asymptotic properties on a semiparametric regression for identifying pathway have never been studied. In this paper, we study the asymptotic properties of the parameter estimates on semiparametric regression and compare Liu et al.'s REML with our REML obtained from a profile likelihood. We prove that both approaches provide consistent estimators, have n convergence rate under regularity conditions, and have either an asymptotically normal distribution or a mixture of normal distributions. However, the estimators based on our REML obtained from a profile likelihood have a theoretically smaller mean squared error than those of Liu et al.'s REML. Simulation study supports this theoretical result. A profile restricted likelihood ratio test is also provided for the non-standard testing problem. We apply our approach to a type II diabetes data set (Mootha et al., 2003). © 2012 Elsevier B.V.
Volume / Issue
Start / End Page
International Standard Serial Number (ISSN)
Digital Object Identifier (DOI)