Joint Geometric and Topological Analysis of Hierarchical Datasets

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In a world abundant with diverse data arising from complex acquisition techniques, there is a growing need for new data analysis methods. In this paper we focus on high-dimensional data that are organized into several hierarchical datasets. We assume that each dataset consists of complex samples, and every sample has a distinct irregular structure modeled by a graph. The main novelty in this work lies in the combination of two complementing powerful data-analytic approaches: topological data analysis (TDA) and geometric manifold learning. Geometry primarily contains local information, while topology inherently provides global descriptors. Based on this combination, we present a method for building an informative representation of hierarchical datasets. At the finer (sample) level, we devise a new metric between samples based on manifold learning that facilitates quantitative structural analysis. At the coarser (dataset) level, we employ TDA to extract qualitative structural information from the datasets. We showcase the applicability and advantages of our method on simulated data and on a corpus of hyper-spectral images. We show that an ensemble of hyper-spectral images exhibits a hierarchical structure that fits well the considered setting. In addition, we show that our new method gives rise to superior classification results compared to state-of-the-art methods.

Original languageEnglish
Title of host publicationMACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT III
EditorsNuria Oliver, Fernando Pérez-Cruz, Stefan Kramer, Jesse Read, Jose A. Lozano
PublisherSpringer Science and Business Media Deutschland GmbH
Pages478-493
Number of pages16
ISBN (Print)9783030865221
DOIs
StatePublished - 2021
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021 - Virtual, Online
Duration: 13 Sep 202117 Sep 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12977 LNAI

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021
CityVirtual, Online
Period13/09/2117/09/21

Keywords

  • Diffusion maps
  • Geometric learning
  • Manifold learning
  • Persistent homology
  • Topological data analysis

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Joint Geometric and Topological Analysis of Hierarchical Datasets'. Together they form a unique fingerprint.

Cite this