TDA + ML: Utilizing Topological Structures of Data for Machine Learning
Date:
Talk at MSU TDA Seminar, Fall 2020, East Lansing, Michigan
Abstract:
The ever-increasing size and complexity of data pose fundamental challenges to existing machine learning techniques, which are typically designed to work with data in vector forms. We believe that topological data analysis (TDA) can provide a different perspective to address these challenges. TDA is a multidisciplinary field that studies the topological structures of data. TDA techniques can be particularly powerful in handling data modeled as trees, graphs, simplicial complexes, hypergraphs, or ensembles of these objects. The theme of this dissertation is to bring together the fields of TDA and machine learning. Throughout this dissertation, we describe ways to integrate ideas from TDA into different stages of a machine learning pipeline. We first present unsupervised and semisupervised learning algorithms that leverage the topological structure of the data. Then, we present methods to compare complex objects such as graphs and their ensembles. We describe ways to extract topological summaries from these objects and utilize them as input features in machine learning. Our specific contributions include the following: - We present a spectral sparsification algorithm for simplicial complexes and algorithms for unsupervised and semisupervised learning on simplicial complexes, specifically, spectral clustering and label propagation. - We present ways to utilize topological features of brain networks in statistical inference and machine learning tasks such as classification and regression. - We present methods to evaluate the structural variability within an ensemble of graphs arising from graph reduction algorithms. Our vision is to develop new machine learning frameworks that seamlessly integrate ideas from TDA.