Coresets for Data Discretization and Sine Wave Fitting

Alaa Maalouf, Murad Tukan, Eric Price, Daniel Kane, Dan Feldman

Research output: Contribution to journalConference articlepeer-review

Abstract

In the monitoring problem, the input is an unbounded stream P = p1, p2 · · · of integers in [N]:= {1, · · ·, N}, that are obtained from a sensor (such as GPS or heart beats of a human). The goal (e.g., for anomaly detection) is to approximate the n points received so far in P by a single frequency sin, e.g. mincC cost(P, c) + λ(c), where cost(P, c) = -ni=1 sin2(2πN pic), C ⊆ [N] is a feasible set of solutions, and λ is a given regularization function. For any approximation error ε > 0, we prove that every set P of n integers has a weighted subset S ⊆ P (sometimes called core-set) of cardinality |S| ∈ O(log(N)O(1)) that approximates cost(P, c) (for every c ∈ [N]) up to a multiplicative factor of 1 ±ε. Using known coreset techniques, this implies streaming algorithms using only O((log(N) log(n))O(1)) memory. Our results hold for a large family of functions. Experimental results and open source code are provided.

Original languageAmerican English
Pages (from-to)10622-10639
Number of pages18
JournalProceedings of Machine Learning Research
Volume151
StatePublished - 2022
Event25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022 - Virtual, Online, Spain
Duration: 28 Mar 202230 Mar 2022

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Coresets for Data Discretization and Sine Wave Fitting'. Together they form a unique fingerprint.

Cite this