Nearly optimal classification for semimetrics

Research output: Contribution to conferencePaperpeer-review

Abstract

We initiate the rigorous study of classification in semimetric spaces, which are point sets with a distance function that is non-negative and symmetric, but need not satisfy the triangle inequality. We define the density dimension dens and discover that it plays a central role in the statistical and algorithmic feasibility of learning in semimetric spaces. We compute this quantity for several widely used semimetrics and present nearly optimal sample compression algorithms, which are then used to obtain generalization guarantees, including fast rates. Our claim of near-optimality holds in both computational and statistical senses. When the sample has radius R and margin γ, we show that it can be compressed down to roughly d = (R/γ)dens points, and further that finding a significantly better compression is algorithmically intractable unless P=NP. This compression implies generalization via standard Occam-type arguments, to which we provide a nearly matching lower bound.

Original languageEnglish
Pages379-388
Number of pages10
StatePublished - 1 Jan 2016
Event19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016 - Cadiz, Spain
Duration: 9 May 201611 May 2016

Conference

Conference19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016
Country/TerritorySpain
CityCadiz
Period9/05/1611/05/16

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Nearly optimal classification for semimetrics'. Together they form a unique fingerprint.

Cite this