Bayesian analysis of binary prediction tree models for retrospectively sampled outcomes.

Journal Article

Classification tree models are flexible analysis tools which have the ability to evaluate interactions among predictors as well as generate predictions for responses of interest. We describe Bayesian analysis of a specific class of tree models in which binary response data arise from a retrospective case-control design. We are also particularly interested in problems with potentially very many candidate predictors. This scenario is common in studies concerning gene expression data, which is a key motivating example context. Innovations here include the introduction of tree models that explicitly address and incorporate the retrospective design, and the use of nonparametric Bayesian models involving Dirichlet process priors on the distributions of predictor variables. The model specification influences the generation of trees through Bayes' factor based tests of association that determine significant binary partitions of nodes during a process of forward generation of trees. We describe this constructive process and discuss questions of generating and combining multiple trees via Bayesian model averaging for prediction. Additional discussion of parameter selection and sensitivity is given in the context of an example which concerns prediction of breast tumour status utilizing high-dimensional gene expression data; the example demonstrates the exploratory/explanatory uses of such models as well as their primary utility in prediction. Shortcomings of the approach and comparison with alternative tree modelling algorithms are also discussed, as are issues of modelling and computational extensions.

Full Text

Duke Authors

Cited Authors

  • Pittman, J; Huang, E; Nevins, J; Wang, Q; West, M

Published Date

  • October 2004

Published In

Volume / Issue

  • 5 / 4

Start / End Page

  • 587 - 601

PubMed ID

  • 15475421

International Standard Serial Number (ISSN)

  • 1465-4644

Digital Object Identifier (DOI)

  • 10.1093/biostatistics/kxh011

Language

  • eng

Conference Location

  • England