Finding multiple stable clusterings
Multi-clustering, which tries to find multiple independent ways to partition a data set into groups, has enjoyed many applications, such as customer relationship management, bioinformatics and healthcare informatics. This paper addresses two fundamental questions in multi-clustering: How to model quality of clusterings and how to find multiple stable clusterings (MSC). We introduce to multi-clustering the notion of clustering stability based on Laplacian eigengap, which was originally used by the regularized spectral learning method for similarity matrix learning. We mathematically prove that the larger the eigengap, the more stable the clustering. Furthermore, we propose a novel multi-clustering method MSC. An advantage of our method comparing to the state-of-the-art multi-clustering methods is that our method can provide users a feature subspace to understand each clustering solution. Another advantage is that MSC does not need users to specify the number of clusters and the number of alternative clusterings, which is usually difficult for users without any guidance. Our method can heuristically estimate the number of stable clusterings in a data set. We also discuss a practical way to make MSC applicable to large-scale data. We report an extensive empirical study that clearly demonstrates the effectiveness of our method.
Duke Scholars
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Information Systems
- 46 Information and computing sciences
- 0806 Information Systems
- 0801 Artificial Intelligence and Image Processing
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Information Systems
- 46 Information and computing sciences
- 0806 Information Systems
- 0801 Artificial Intelligence and Image Processing