Data-Driven Tree Transforms and Metrics

Gal Mishne, Ronen Talmon, Israel Cohen, Ronald R. Coifman, Yuval Kluger

Research output: Contribution to journalArticlepeer-review

Abstract

We consider the analysis of high-dimensional data given in the form of a matrix with columns consisting of observations and rows consisting of features. Often the data is such that the observations do not reside on a regular grid, and the given order of the features is arbitrary and does not convey a notion of locality. Therefore, traditional transforms and metrics cannot be used for data organization and analysis. In this paper, our goal is to organize the data by defining an appropriate representation and metric such that they respect the smoothness and structure underlying the data. We also aim to generalize the joint clustering of observations and features in the case the data does not fall into clear disjoint groups. For this purpose, we propose multiscale data-driven transforms and metrics based on trees. Their construction is implemented in an iterative refinement procedure that exploits the co-dependencies between features and observations. Beyond the organization of a single dataset, our approach enables us to transfer the organization learned from one dataset to another and to integrate several datasets together. We present an application to breast cancer gene expression analysis: Learning metrics on the genes to cluster the tumor samples into cancer subtypes and validating the joint organization of both the genes and the samples. We demonstrate that using our approach to combine information from multiple gene expression cohorts, acquired by different profiling technologies, improves the clustering of tumor samples.

Original languageEnglish
Article number8015135
Pages (from-to)451-466
Number of pages16
JournalIEEE Transactions on Signal and Information Processing over Networks
Volume4
Issue number3
DOIs
StatePublished - Sep 2018

Keywords

  • Gene expression
  • geometric analysis
  • graph signal processing sa
  • multiscale representations
  • partition trees

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Information Systems
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Data-Driven Tree Transforms and Metrics'. Together they form a unique fingerprint.

Cite this