Scholars@Duke publication: Selection Sampling from Large Data Sets for Targeted Inference in Mixture Modeling.

Selection Sampling from Large Data Sets for Targeted Inference in Mixture Modeling.

Publication , Journal Article

Manolopoulou, I; Chan, C; West, M

Published in: Bayesian analysis

January 2010

One of the challenges in using Markov chain Monte Carlo for model analysis in studies with very large datasets is the need to scan through the whole data at each iteration of the sampler, which can be computationally prohibitive. Several approaches have been developed to address this, typically drawing computationally manageable subsamples of the data. Here we consider the specific case where most of the data from a mixture model provides little or no information about the parameters of interest, and we aim to select subsamples such that the information extracted is most relevant. The motivating application arises in flow cytometry, where several measurements from a vast number of cells are available. Interest lies in identifying specific rare cell subtypes and characterizing them according to their corresponding markers. We present a Markov chain Monte Carlo approach where an initial subsample of the full dataset is used to guide selection sampling of a further set of observations targeted at a scientifically interesting, low probability region. We define a Sequential Monte Carlo strategy in which the targeted subsample is augmented sequentially as estimates improve, and introduce a stopping rule for determining the size of the targeted subsample. An example from flow cytometry illustrates the ability of the approach to increase the resolution of inferences for rare cell subtypes.

Duke Scholars

Author Mike West Statistical Science

Published In

Bayesian analysis

EISSN

1931-6690

ISSN

1936-0975

Publication Date

January 2010

Volume

Issue

Start / End Page

1 / 22

Related Subject Headings

Statistics & Probability
4905 Statistics
0104 Statistics

Citation

APA

Chicago

ICMJE

MLA

NLM

Manolopoulou, I., Chan, C., & West, M. (2010). Selection Sampling from Large Data Sets for Targeted Inference in Mixture Modeling. Bayesian Analysis, 5(3), 1–22.

Manolopoulou, Ioanna, Cliburn Chan, and Mike West. “Selection Sampling from Large Data Sets for Targeted Inference in Mixture Modeling.” Bayesian Analysis 5, no. 3 (January 2010): 1–22.

Manolopoulou I, Chan C, West M. Selection Sampling from Large Data Sets for Targeted Inference in Mixture Modeling. Bayesian analysis. 2010 Jan;5(3):1–22.

Manolopoulou, Ioanna, et al. “Selection Sampling from Large Data Sets for Targeted Inference in Mixture Modeling.” Bayesian Analysis, vol. 5, no. 3, Jan. 2010, pp. 1–22.

Manolopoulou I, Chan C, West M. Selection Sampling from Large Data Sets for Targeted Inference in Mixture Modeling. Bayesian analysis. 2010 Jan;5(3):1–22.

Published In

Bayesian analysis

EISSN

1931-6690

ISSN

1936-0975

Publication Date

January 2010

Volume

Issue

Start / End Page

1 / 22

Related Subject Headings

Statistics & Probability
4905 Statistics
0104 Statistics