G-Mapper: Learning a Cover in the Mapper Construction

Published in ArXiv preprint [cs.LG]

Abstract:

The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. However, the Mapper algorithm requires tuning several parameters in order to generate a "nice" Mapper graph. This paper focuses on selecting the cover parameter. We present an algorithm that optimizes the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality. Our algorithm is based on G-means clustering which searches for the optimal number of clusters in k-means by iteratively applying the Anderson-Darling test. Our splitting procedure employs a Gaussian mixture model to carefully choose the cover according to the distribution of the given data. Experiments for synthetic and real-world datasets demonstrate that our algorithm generates covers so that the Mapper graphs retain the essence of the datasets, while also running significantly fast.

Cite as: Enrique Alvarado, Robin Belton, Emily Fischer, Kang-Ju Lee, et al. "$G$-Mapper: Learning a Cover in the Mapper Construction." ArXiv [cs.LG], 2024.

Access on publisher's website: here

Download PDF: