Predicting viral infection from high-dimensional biomarker trajectories

Journal Article

There is often interest in predicting an individual's latent health status based on high-dimensional biomarkers that vary over time. Motivated by time-course gene expression array data that we have collected in two influenza challenge studies performed with healthy human volunteers, we develop a novel time-aligned Bayesian dynamic factor analysis methodology. The time course trajectories in the gene expressions are related to a relatively low-dimensional vector of latent factors, which vary dynamically starting at the latent initiation time of infection. Using a nonparametric cure rate model for the latent initiation times, we allow selection of the genes in the viral response pathway, variability among individuals in infection times, and a subset of individuals who are not infected. As we demonstrate using held-out data, this statistical framework allows accurate predictions of infected individuals in advance of the development of clinical symptoms, without labeled data and even when the number of biomarkers vastly exceeds the number of individuals under study. Biological interpretation of several of the inferred pathways (factors) is provided. © 2011 American Statistical Association.

Full Text

Duke Authors

Cited Authors

  • Chen, M; Zaas, A; Woods, C; Ginsburg, GS; Lucas, J; Dunson, D; Carin, L

Published Date

  • 2011

Published In

Volume / Issue

  • 106 / 496

Start / End Page

  • 1259 - 1279

PubMed ID

  • 23704802

International Standard Serial Number (ISSN)

  • 0162-1459

Digital Object Identifier (DOI)

  • 10.1198/jasa.2011.ap10611