## Abstract

Distribution testing is a central part of property testing, with applications to various research areas, such as computational and statistical learning, information theory, and probabilistic program checking. The original distribution testing model relies on samples drawn independently from the distribution to be tested. However, when the distribution in question is over the n-dimensional Hamming cube {0, 1}^{n} for a large n, even reading a few samples is infeasible. To address this, Goldreich and Ron [ITCS 2022] have defined a model called the huge object model, in which the samples may only be queried in a few places. For any sample/query model, the following three questions are considered fundamental: (i) understand what classes of objects can be “learned easily", (ii) characterize testable properties, that is, properties that can be tested in the given sample/query model using a constant number of samples/queries, and (iii) understand the gap between adaptive and non-adaptive query/sample complexities. In this work, we study these questions for the huge object model for distribution testing. To do so, we initiate a study of a general class of distribution properties that are invariant under a permutation of the indices of the vectors in {0, 1}^{n}, while still not being necessarily fully symmetric as per the definition used in traditional distribution testing. We prove that every distribution over {0, 1}^{n} whose support has a bounded VC-dimension can be efficiently learned up to a permutation. The number of queries made by the algorithm depends only on the VC-dimension of the support of the distribution and is independent of n. This gives efficient testers for index-invariant distribution properties that admit a global VC-dimension bound. To complement this result, we argue that satisfying only index-invariance or only a VC-dimension bound is insufficient to guarantee a tester whose query complexity is independent of n. Moreover, we prove that the dependency of the sample and query complexities of our tester on the VC-dimension is essentially tight. As a second part of this work, we address the question of the number of queries required for non-adaptive testing. We show that it can be at most quadratic in the number of queries required for an adaptive tester in the case of index-invariant properties. This contrasts with the tight (easily provable) exponential gap between adaptive and non-adaptive testers for general non-index-invariant properties. Finally, we provide an index-invariant property for which the quadratic gap between adaptive and non-adaptive query complexities for testing is almost tight.

Original language | English |
---|---|

Pages (from-to) | 3065-3136 |

Number of pages | 72 |

Journal | Proceedings of Machine Learning Research |

Volume | 195 |

State | Published - 2023 |

Event | 36th Annual Conference on Learning Theory, COLT 2023 - Bangalore, India Duration: 12 Jul 2023 → 15 Jul 2023 |

## Keywords

- Distribution Testing
- Huge Object Model
- Index-Invariant Properties
- Query Complexity
- Sample Complexity

## All Science Journal Classification (ASJC) codes

- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability