Abstract
We introduce a new distance-preserving compact representation of multidimensional point-sets. Given n points in a d-dimensional space where each coordinate is represented using B bits (i.e., dB bits per point), it produces a representation of size O(dlog(dB/e) + logn) bits per point from which one can approximate the distances up to a factor of 1 ± e. Our algorithm almost matches the recent bound of [6] while being much simpler. We compare our algorithm to Product Quantization (PQ) [7], a state of the art heuristic metric compression method. We evaluate both algorithms on several data sets: SIFT (used in [7]), MNIST [11], New York City taxi time series [4] and a synthetic one-dimensional data set embedded in a high-dimensional space. With appropriately tuned parameters, our algorithm produces representations that are comparable to or better than those produced by PQ, while having provable guarantees on its performance.
Original language | English |
---|---|
Pages (from-to) | 2618-2627 |
Number of pages | 10 |
Journal | Advances in Neural Information Processing Systems |
Volume | 2017-December |
State | Published - 2017 |
Externally published | Yes |
Event | 31st Annual Conference on Neural Information Processing Systems, NIPS 2017 - Long Beach, United States Duration: 4 Dec 2017 → 9 Dec 2017 |
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
- Information Systems
- Signal Processing