TY - GEN
T1 - Interactive proofs for verifying machine learning
AU - Goldwasser, Shafi
AU - Rothblum, Guy N.
AU - Shafer, Jonathan
AU - Yehudayoff, Amir
N1 - Publisher Copyright: © Shafi Goldwasser, Guy N. Rothblum, Jonathan Shafer, and Amir Yehudayoff.
PY - 2021/2/1
Y1 - 2021/2/1
N2 - We consider the following question: using a source of labeled data and interaction with an untrusted prover, what is the complexity of verifying that a given hypothesis is “approximately correct”? We study interactive proof systems for PAC verification, where a verifier that interacts with a prover is required to accept good hypotheses, and reject bad hypotheses. Both the verifier and the prover are efficient and have access to labeled data samples from an unknown distribution. We are interested in cases where the verifier can use significantly less data than is required for (agnostic) PAC learning, or use a substantially cheaper data source (e.g., using only random samples for verification, even though learning requires membership queries). We believe that today, when data and data-driven algorithms are quickly gaining prominence, the question of verifying purported outcomes of data analyses is very well-motivated. We show three main results. First, we prove that for a specific hypothesis class, verification is significantly cheaper than learning in terms of sample complexity, even if the verifier engages with the prover only in a single-round (NP-like) protocol. Moreover, for this class we prove that single-round verification is also significantly cheaper than testing closeness to the class. Second, for the broad class of Fourier-sparse boolean functions, we show a multi-round (IP-like) verification protocol, where the prover uses membership queries, and the verifier is able to assess the result while only using random samples. Third, we show that verification is not always more efficient. Namely, we show a class of functions where verification requires as many samples as learning does, up to a logarithmic factor.
AB - We consider the following question: using a source of labeled data and interaction with an untrusted prover, what is the complexity of verifying that a given hypothesis is “approximately correct”? We study interactive proof systems for PAC verification, where a verifier that interacts with a prover is required to accept good hypotheses, and reject bad hypotheses. Both the verifier and the prover are efficient and have access to labeled data samples from an unknown distribution. We are interested in cases where the verifier can use significantly less data than is required for (agnostic) PAC learning, or use a substantially cheaper data source (e.g., using only random samples for verification, even though learning requires membership queries). We believe that today, when data and data-driven algorithms are quickly gaining prominence, the question of verifying purported outcomes of data analyses is very well-motivated. We show three main results. First, we prove that for a specific hypothesis class, verification is significantly cheaper than learning in terms of sample complexity, even if the verifier engages with the prover only in a single-round (NP-like) protocol. Moreover, for this class we prove that single-round verification is also significantly cheaper than testing closeness to the class. Second, for the broad class of Fourier-sparse boolean functions, we show a multi-round (IP-like) verification protocol, where the prover uses membership queries, and the verifier is able to assess the result while only using random samples. Third, we show that verification is not always more efficient. Namely, we show a class of functions where verification requires as many samples as learning does, up to a logarithmic factor.
KW - Complexity gaps
KW - Complexity lower bounds
KW - Distribution testing
KW - Fourier analysis of boolean functions
KW - Goldreich-levin algorithm
KW - Kushilevitz-mansour algorithm
KW - PAC learning
UR - http://www.scopus.com/inward/record.url?scp=85114420210&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.ITCS.2021.41
DO - 10.4230/LIPIcs.ITCS.2021.41
M3 - منشور من مؤتمر
T3 - Leibniz International Proceedings in Informatics, LIPIcs
SP - 41:1-41:19
BT - 12th Innovations in Theoretical Computer Science Conference, ITCS 2021
A2 - Lee, James R.
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 12th Innovations in Theoretical Computer Science Conference, ITCS 2021
Y2 - 6 January 2021 through 8 January 2021
ER -