Coreset for Line-Sets Clustering

Sagi Lotan, Ernesto Evgeniy Sanches Shayda, Dan Feldman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The input to the line-sets k-median problem is an integer k ≥ 1, and a set L = {L1,..., Ln} that contains n sets of lines in Rd. The goal is to compute a set C of k centers (points in Rd) that minimizes the sum ΣL∈L min∈L,c∈C dist(ℓ, c) of Euclidean distances from each set to its closest center, where dist(ℓ, c):= minx ∥x - c∥2. An ε-coreset for this problem is a weighted subset of sets in L that approximates this sum up to 1 ± ε multiplicative factor, for every set C of k centers. We prove that every such input set L has a small ε-coreset, and provide the first coreset construction for this problem and its variants. The coreset consists of O(log2 n) weighted line-sets from L, and is constructed in O(n log n) time for every fixed d, k ≥ 1 and ε ∈ (0, 1). The main technique is based on a novel reduction to a “fair clustering” of colored points to colored centers. We then provide a coreset for this coloring problem, which may be of independent interest. Open source code and experiments are also provided.

Original languageAmerican English
Title of host publicationAdvances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
EditorsS. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh
PublisherNeural information processing systems foundation
ISBN (Electronic)9781713871088
StatePublished - 2022
Event36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, United States
Duration: 28 Nov 20229 Dec 2022

Publication series

NameAdvances in Neural Information Processing Systems
Volume35

Conference

Conference36th Conference on Neural Information Processing Systems, NeurIPS 2022
Country/TerritoryUnited States
CityNew Orleans
Period28/11/229/12/22

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this