Skip to main content
Journal cover image

Multisource single-cell data integration by MAW barycenter for Gaussian mixture models.

Publication ,  Journal Article
Lin, L; Shi, W; Ye, J; Li, J
Published in: Biometrics
June 2023

One key challenge encountered in single-cell data clustering is to combine clustering results of data sets acquired from multiple sources. We propose to represent the clustering result of each data set by a Gaussian mixture model (GMM) and produce an integrated result based on the notion of Wasserstein barycenter. However, the precise barycenter of GMMs, a distribution on the same sample space, is computationally infeasible to solve. Importantly, the barycenter of GMMs may not be a GMM containing a reasonable number of components. We thus propose to use the minimized aggregated Wasserstein (MAW) distance to approximate the Wasserstein metric and develop a new algorithm for computing the barycenter of GMMs under MAW. Recent theoretical advances further justify using the MAW distance as an approximation for the Wasserstein metric between GMMs. We also prove that the MAW barycenter of GMMs has the same expectation as the Wasserstein barycenter. Our proposed algorithm for clustering integration scales well with the data dimension and the number of mixture components, with complexity independent of data size. We demonstrate that the new method achieves better clustering results on several single-cell RNA-seq data sets than some other popular methods.

Duke Scholars

Published In

Biometrics

DOI

EISSN

1541-0420

Publication Date

June 2023

Volume

79

Issue

2

Start / End Page

866 / 877

Location

England

Related Subject Headings

  • Statistics & Probability
  • Normal Distribution
  • Cluster Analysis
  • Algorithms
  • 4905 Statistics
  • 0199 Other Mathematical Sciences
  • 0104 Statistics
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Lin, L., Shi, W., Ye, J., & Li, J. (2023). Multisource single-cell data integration by MAW barycenter for Gaussian mixture models. Biometrics, 79(2), 866–877. https://doi.org/10.1111/biom.13630
Lin, Lin, Wei Shi, Jianbo Ye, and Jia Li. “Multisource single-cell data integration by MAW barycenter for Gaussian mixture models.Biometrics 79, no. 2 (June 2023): 866–77. https://doi.org/10.1111/biom.13630.
Lin L, Shi W, Ye J, Li J. Multisource single-cell data integration by MAW barycenter for Gaussian mixture models. Biometrics. 2023 Jun;79(2):866–77.
Lin, Lin, et al. “Multisource single-cell data integration by MAW barycenter for Gaussian mixture models.Biometrics, vol. 79, no. 2, June 2023, pp. 866–77. Pubmed, doi:10.1111/biom.13630.
Lin L, Shi W, Ye J, Li J. Multisource single-cell data integration by MAW barycenter for Gaussian mixture models. Biometrics. 2023 Jun;79(2):866–877.
Journal cover image

Published In

Biometrics

DOI

EISSN

1541-0420

Publication Date

June 2023

Volume

79

Issue

2

Start / End Page

866 / 877

Location

England

Related Subject Headings

  • Statistics & Probability
  • Normal Distribution
  • Cluster Analysis
  • Algorithms
  • 4905 Statistics
  • 0199 Other Mathematical Sciences
  • 0104 Statistics