Teaching and compressing for low VC-dimension

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

In this work we study the quantitative relation between VC-dimension and two other basic parameters related to learning and teaching. Namely, the quality of sample compression schemes and of teaching sets for classes of low VCdimension. Let C be a binary concept class of size m and VC-dimension d. Prior to this work, the best known upper bounds for both parameters were log(m), while the best lower bounds are linear in d. We present significantly better upper bounds on both as follows. Set k D O(d2d log log |C|). We show that there always exists a concept c in C with a teaching set (i.e. a list of c-labeled examples uniquely identifying c in C) of size k. This problem was studied by Kuhlmann (On teaching and learning intersection-closed concept classes. In: EuroCOLT, pp 168-182, 1999). Our construction implies that the recursive teaching (RT) dimension of C is at most k as well. The RT-dimension was suggested by Zilles et al. (J Mach Learn Res 12:349-384, 2011) and Doliwa et al. (Recursive teaching dimension, learning complexity, and maximum classes. In:ALT, pp 209-223, 2010). The same notion (under the name partial-ID width) was independently studied by Wigderson and Yehudayoff (Population recovery and partial identification. In: FOCS, pp 390-399, 2012). An upper bound on this parameter that depends only on d is known just for the very simple case d D 1, and is open even for d D 2. We also make small progress towards this seemingly modest goal. We further construct sample compression schemes of size k for C, with additional information of k log(k) bits. Roughly speaking, given any list of C-labelled examples of arbitrary length, we can retain only k labeled examples in a way that allows to recover the labels of all others examples in the list, using additional k log(k) information bits. This problem was first suggested by Littlestone and Warmuth (Relating data compression and learnability. Unpublished, 1986).

Original languageEnglish
Title of host publicationA Journey through Discrete Mathematics
Subtitle of host publicationA Tribute to Jiri Matousek
Pages633-656
Number of pages24
ISBN (Electronic)9783319444796
DOIs
StatePublished - 1 Jan 2017

All Science Journal Classification (ASJC) codes

  • Economics, Econometrics and Finance(all)
  • General Computer Science
  • General Economics,Econometrics and Finance
  • General Business,Management and Accounting
  • General Mathematics

Fingerprint

Dive into the research topics of 'Teaching and compressing for low VC-dimension'. Together they form a unique fingerprint.

Cite this