TY - UNPB
T1 - A Constant Approximation Algorithm for Sequential No-Substitution k-Median Clustering under a Random Arrival Order
AU - Hess, Tom
AU - Moshkovitz, Michal
AU - Sabato, Sivan
PY - 2021/6/6
Y1 - 2021/6/6
N2 - We study k-median clustering under the sequential no-substitution setting. In this setting, a data stream is sequentially observed, and some of the points are selected by the algorithm as cluster centers. However, a point can be selected as a center only immediately after it is observed, before observing the next point. In addition, a selected center cannot be substituted later. We give a new algorithm for this setting that obtains a constant approximation factor on the optimal risk under a random arrival order. This is the first such algorithm that holds without any assumptions on the input data and selects a non-trivial number of centers. The number of selected centers is quasi-linear in k. Our algorithm and analysis are based on a careful risk estimation that avoids outliers, a new concept of a linear bin division, and repeated calculations using an offline clustering algorithm.
AB - We study k-median clustering under the sequential no-substitution setting. In this setting, a data stream is sequentially observed, and some of the points are selected by the algorithm as cluster centers. However, a point can be selected as a center only immediately after it is observed, before observing the next point. In addition, a selected center cannot be substituted later. We give a new algorithm for this setting that obtains a constant approximation factor on the optimal risk under a random arrival order. This is the first such algorithm that holds without any assumptions on the input data and selects a non-trivial number of centers. The number of selected centers is quasi-linear in k. Our algorithm and analysis are based on a careful risk estimation that avoids outliers, a new concept of a linear bin division, and repeated calculations using an offline clustering algorithm.
KW - Computer Science - Machine Learning
KW - Statistics - Machine Learning
U2 - 10.48550/arXiv.2102.04050
DO - 10.48550/arXiv.2102.04050
M3 - Preprint
BT - A Constant Approximation Algorithm for Sequential No-Substitution k-Median Clustering under a Random Arrival Order
ER -