Applying Compression to Hierarchical Clustering

Gilad Baruch, Shmuel Tomi Klein, Dana Shapira

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Hierarchical Clustering is widely used in Machine Learning and Data Mining. It stores bit-vectors in the nodes of a k-ary tree, usually without trying to compress them. We suggest a data compression application of hierarchical clustering with a double usage of the xoring operations defining the Hamming distance used in the clustering process, extending it also to be used to transform the vector in one node into a more compressible form, as a function of the vector in the parent node. Compression is then achieved by run-length encoding, followed by optional Huffman coding, and we show how the compressed file may be processed directly, without decompression.

Original languageEnglish
Title of host publicationSimilarity Search and Applications - 11th International Conference, SISAP 2018, Proceedings
EditorsStéphane Marchand-Maillet, Yasin N. Silva, Edgar Chávez
PublisherSpringer Verlag
Pages151-162
Number of pages12
ISBN (Print)9783030022235
DOIs
StatePublished - 2018
Event11th International Conference on Similarity Search and Applications, SISAP 2018 - Lima, Peru
Duration: 7 Oct 20189 Oct 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11223 LNCS

Conference

Conference11th International Conference on Similarity Search and Applications, SISAP 2018
Country/TerritoryPeru
CityLima
Period7/10/189/10/18

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Applying Compression to Hierarchical Clustering'. Together they form a unique fingerprint.

Cite this