TY - GEN
T1 - Coreset for Line-Sets Clustering
AU - Lotan, Sagi
AU - Shayda, Ernesto Evgeniy Sanches
AU - Feldman, Dan
N1 - Publisher Copyright: © 2022 Neural information processing systems foundation. All rights reserved.
PY - 2022
Y1 - 2022
N2 - The input to the line-sets k-median problem is an integer k ≥ 1, and a set L = {L1,..., Ln} that contains n sets of lines in Rd. The goal is to compute a set C of k centers (points in Rd) that minimizes the sum ΣL∈L minℓ∈L,c∈C dist(ℓ, c) of Euclidean distances from each set to its closest center, where dist(ℓ, c):= minx∈ℓ ∥x - c∥2. An ε-coreset for this problem is a weighted subset of sets in L that approximates this sum up to 1 ± ε multiplicative factor, for every set C of k centers. We prove that every such input set L has a small ε-coreset, and provide the first coreset construction for this problem and its variants. The coreset consists of O(log2 n) weighted line-sets from L, and is constructed in O(n log n) time for every fixed d, k ≥ 1 and ε ∈ (0, 1). The main technique is based on a novel reduction to a “fair clustering” of colored points to colored centers. We then provide a coreset for this coloring problem, which may be of independent interest. Open source code and experiments are also provided.
AB - The input to the line-sets k-median problem is an integer k ≥ 1, and a set L = {L1,..., Ln} that contains n sets of lines in Rd. The goal is to compute a set C of k centers (points in Rd) that minimizes the sum ΣL∈L minℓ∈L,c∈C dist(ℓ, c) of Euclidean distances from each set to its closest center, where dist(ℓ, c):= minx∈ℓ ∥x - c∥2. An ε-coreset for this problem is a weighted subset of sets in L that approximates this sum up to 1 ± ε multiplicative factor, for every set C of k centers. We prove that every such input set L has a small ε-coreset, and provide the first coreset construction for this problem and its variants. The coreset consists of O(log2 n) weighted line-sets from L, and is constructed in O(n log n) time for every fixed d, k ≥ 1 and ε ∈ (0, 1). The main technique is based on a novel reduction to a “fair clustering” of colored points to colored centers. We then provide a coreset for this coloring problem, which may be of independent interest. Open source code and experiments are also provided.
UR - http://www.scopus.com/inward/record.url?scp=85163204087&partnerID=8YFLogxK
M3 - Conference contribution
T3 - Advances in Neural Information Processing Systems
BT - Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
A2 - Koyejo, S.
A2 - Mohamed, S.
A2 - Agarwal, A.
A2 - Belgrave, D.
A2 - Cho, K.
A2 - Oh, A.
PB - Neural information processing systems foundation
T2 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
Y2 - 28 November 2022 through 9 December 2022
ER -