TY - GEN
T1 - Characterizing the sample complexity of private learners
AU - Beimel, Amos
AU - Nissim, Kobbi
AU - Stemmer, Uri
PY - 2013/2/11
Y1 - 2013/2/11
N2 - In 2008, Kasiviswanathan el al. defined private learning as a combination of PAC learning and differential privacy [16]. Informally, a private learner is applied to a collection of labeled individual information and outputs a hypothesis while preserving the privacy of each individual. Kasiviswanathan et al. gave a generic construction of private learners for (finite) concept classes, with sample complexity logarithmic in the size of the concept class. This sample complexity is higher than what is needed for non-private learners, hence leaving open the possibility that the sample complexity of private learning may be sometimes significantly higher than that of non-private learning. We give a combinatorial characterization of the sample size sufficient and necessary to privately learn a class of concepts. This characterization is analogous to the well known characterization of the sample complexity of non-private learning in terms of the VC dimension of the concept class. We introduce the notion of probabilistic representation of a concept class, and our new complexity measure RepDim corresponds to the size of the smallest probabilistic representation of the concept class. We show that any private learning algorithm for a concept class C with sample complexity m implies RepDim(C) = O(m), and that there exists a private learning algorithm with sample complexity m = O(RepDim(C)). We further demonstrate that a similar characterization holds for the database size needed for privately computing a large class of optimization problems and also for the well studied problem of private data release.
AB - In 2008, Kasiviswanathan el al. defined private learning as a combination of PAC learning and differential privacy [16]. Informally, a private learner is applied to a collection of labeled individual information and outputs a hypothesis while preserving the privacy of each individual. Kasiviswanathan et al. gave a generic construction of private learners for (finite) concept classes, with sample complexity logarithmic in the size of the concept class. This sample complexity is higher than what is needed for non-private learners, hence leaving open the possibility that the sample complexity of private learning may be sometimes significantly higher than that of non-private learning. We give a combinatorial characterization of the sample size sufficient and necessary to privately learn a class of concepts. This characterization is analogous to the well known characterization of the sample complexity of non-private learning in terms of the VC dimension of the concept class. We introduce the notion of probabilistic representation of a concept class, and our new complexity measure RepDim corresponds to the size of the smallest probabilistic representation of the concept class. We show that any private learning algorithm for a concept class C with sample complexity m implies RepDim(C) = O(m), and that there exists a private learning algorithm with sample complexity m = O(RepDim(C)). We further demonstrate that a similar characterization holds for the database size needed for privately computing a large class of optimization problems and also for the well studied problem of private data release.
KW - differential privacy
KW - pac learning
KW - probabilistic representation
KW - sample complexity
UR - http://www.scopus.com/inward/record.url?scp=84873351478&partnerID=8YFLogxK
U2 - 10.1145/2422436.2422450
DO - 10.1145/2422436.2422450
M3 - Conference contribution
SN - 9781450318594
T3 - ITCS 2013 - Proceedings of the 2013 ACM Conference on Innovations in Theoretical Computer Science
SP - 97
EP - 109
BT - ITCS 2013 - Proceedings of the 2013 ACM Conference on Innovations in Theoretical Computer Science
T2 - 2013 4th ACM Conference on Innovations in Theoretical Computer Science, ITCS 2013
Y2 - 9 January 2013 through 12 January 2013
ER -