Abstract
We consider the (1 + ∈)-approximate nearest neighbor search problem: given a set X of n points in a d-dimensional space, build a data structure that, given any query point y, finds a point x ∈ X whose distance to y is at most (1 + ∈) minx∈X kx − yk for an accuracy parameter ∈ (0, 1). Our main result is a data structure that occupies only O(∈−2n log(n) log(1/∈)) bits of space, assuming all point coordinates are integers in the range {−nO(1) . . . nO(1)}, i.e., the coordinates have O(log n) bits of precision. This improves over the best previously known space bound of O(∈−2n log(n)2), obtained via the randomized dimensionality reduction method of Johnson and Lindenstrauss (1984). We also consider the more general problem of estimating all distances from a collection of query points to all data points X, and provide almost tight upper and lower bounds for the space complexity of this problem.
| Original language | English |
|---|---|
| Pages (from-to) | 2012-2036 |
| Number of pages | 25 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 75 |
| State | Published - 2018 |
| Externally published | Yes |
| Event | 31st Annual Conference on Learning Theory, COLT 2018 - Stockholm, Sweden Duration: 6 Jul 2018 → 9 Jul 2018 |
Keywords
- dimension reduction
- distance estimation
- distance sketches
- metric compression
- nearest neighbor
- quantization
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability