Skip to main content

CPS analysis: self-contained validation of biomedical data clustering.

Publication ,  Journal Article
Zhang, L; Lin, L; Li, J
Published in: Bioinformatics
June 1, 2020

MOTIVATION: Cluster analysis is widely used to identify interesting subgroups in biomedical data. Since true class labels are unknown in the unsupervised setting, it is challenging to validate any cluster obtained computationally, an important problem barely addressed by the research community. RESULTS: We have developed a toolkit called covering point set (CPS) analysis to quantify uncertainty at the levels of individual clusters and overall partitions. Functions have been developed to effectively visualize the inherent variation in any cluster for data of high dimension, and provide more comprehensive view on potentially interesting subgroups in the data. Applying to three usage scenarios for biomedical data, we demonstrate that CPS analysis is more effective for evaluating uncertainty of clusters comparing to state-of-the-art measurements. We also showcase how to use CPS analysis to select data generation technologies or visualization methods. AVAILABILITY AND IMPLEMENTATION: The method is implemented in an R package called OTclust, available on CRAN. CONTACT: lzz46@psu.edu or jiali@psu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Bioinformatics

DOI

EISSN

1367-4811

Publication Date

June 1, 2020

Volume

36

Issue

11

Start / End Page

3516 / 3521

Location

England

Related Subject Headings

  • Software
  • Cluster Analysis
  • Bioinformatics
  • 49 Mathematical sciences
  • 46 Information and computing sciences
  • 31 Biological sciences
  • 08 Information and Computing Sciences
  • 06 Biological Sciences
  • 01 Mathematical Sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Zhang, L., Lin, L., & Li, J. (2020). CPS analysis: self-contained validation of biomedical data clustering. Bioinformatics, 36(11), 3516–3521. https://doi.org/10.1093/bioinformatics/btaa165
Zhang, Lixiang, Lin Lin, and Jia Li. “CPS analysis: self-contained validation of biomedical data clustering.Bioinformatics 36, no. 11 (June 1, 2020): 3516–21. https://doi.org/10.1093/bioinformatics/btaa165.
Zhang L, Lin L, Li J. CPS analysis: self-contained validation of biomedical data clustering. Bioinformatics. 2020 Jun 1;36(11):3516–21.
Zhang, Lixiang, et al. “CPS analysis: self-contained validation of biomedical data clustering.Bioinformatics, vol. 36, no. 11, June 2020, pp. 3516–21. Pubmed, doi:10.1093/bioinformatics/btaa165.
Zhang L, Lin L, Li J. CPS analysis: self-contained validation of biomedical data clustering. Bioinformatics. 2020 Jun 1;36(11):3516–3521.

Published In

Bioinformatics

DOI

EISSN

1367-4811

Publication Date

June 1, 2020

Volume

36

Issue

11

Start / End Page

3516 / 3521

Location

England

Related Subject Headings

  • Software
  • Cluster Analysis
  • Bioinformatics
  • 49 Mathematical sciences
  • 46 Information and computing sciences
  • 31 Biological sciences
  • 08 Information and Computing Sciences
  • 06 Biological Sciences
  • 01 Mathematical Sciences