CPS analysis: self-contained validation of biomedical data clustering.
MOTIVATION: Cluster analysis is widely used to identify interesting subgroups in biomedical data. Since true class labels are unknown in the unsupervised setting, it is challenging to validate any cluster obtained computationally, an important problem barely addressed by the research community. RESULTS: We have developed a toolkit called covering point set (CPS) analysis to quantify uncertainty at the levels of individual clusters and overall partitions. Functions have been developed to effectively visualize the inherent variation in any cluster for data of high dimension, and provide more comprehensive view on potentially interesting subgroups in the data. Applying to three usage scenarios for biomedical data, we demonstrate that CPS analysis is more effective for evaluating uncertainty of clusters comparing to state-of-the-art measurements. We also showcase how to use CPS analysis to select data generation technologies or visualization methods. AVAILABILITY AND IMPLEMENTATION: The method is implemented in an R package called OTclust, available on CRAN. CONTACT: lzz46@psu.edu or jiali@psu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Duke Scholars
Altmetric Attention Stats
Dimensions Citation Stats
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Location
Related Subject Headings
- Software
- Cluster Analysis
- Bioinformatics
- 49 Mathematical sciences
- 46 Information and computing sciences
- 31 Biological sciences
- 08 Information and Computing Sciences
- 06 Biological Sciences
- 01 Mathematical Sciences
Citation
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Location
Related Subject Headings
- Software
- Cluster Analysis
- Bioinformatics
- 49 Mathematical sciences
- 46 Information and computing sciences
- 31 Biological sciences
- 08 Information and Computing Sciences
- 06 Biological Sciences
- 01 Mathematical Sciences