Active learning for sampling in time-series experiments with application to gene expression analysis
Many time-series experiments seek to estimate some signal as a continuous function of time. In this paper, we address the sampling problem for such experiments: determining which time-points ought to be sampled in order to minimize the cost of data collection. We restrict our attention to a growing class of experiments which measure multiple signals at each time-point and where raw materials/observations are archived initially, and selectively analyzed later, this analysis being the more expensive step. We present an active learning algorithm for iteratively choosing time-points to sample, using the uncertainty in the quality of the currently estimated time-dependent curve as the objective function. Using simulated data as well as gene expression data, we show that our algorithm performs well, and can significantly reduce experimental cost without loss of information.