Scaffoldings and Spines: Organizing High-Dimensional Data Using Cover Trees, Local Principal Component Analysis, and Persistent Homology

Journal Article

We propose a flexible and multi-scale method for organizing, visualizing, and understanding datasets sampled from or near stratified spaces. The first part of the algorithm produces a cover tree using adaptive thresholds based on a combination of multi-scale local principal component analysis and topological data analysis. The resulting cover tree nodes consist of points within or near the same stratum of the stratified space. They are then connected to form a \emph{scaffolding} graph, which is then simplified and collapsed down into a \emph{spine} graph. From this latter graph the stratified structure becomes apparent. We demonstrate our technique on several synthetic point cloud examples and we use it to understand song structure in musical audio data.

Full Text

Duke Authors

Cited Authors

  • Bendich, P; Gasparovic, E; Tralie, CJ; Harer, J