Exploring the Gap between Tolerant and Non-Tolerant Distribution Testing

Sourav Chakraborty, Eldar Fischer, Arijit Ghosh, Gopinath Mishra, Sayantan Sen

Research output: Contribution to journalArticlepeer-review

Abstract

The framework of distribution testing is currently ubiquitous in the field of property testing. In this model, the input is a probability distribution accessible via independently drawn samples from an oracle. The testing task is to distinguish a distribution that satisfies some property from a distribution that is far in some distance measure from satisfying it. The task of tolerant testing imposes a further restriction, that distributions close to satisfying the property are also accepted. This work focuses on the connection between the sample complexities of non-tolerant testing of distributions and their tolerant testing counterparts. When limiting our scope to label-invariant (symmetric) properties of distributions, we prove that the gap is at most quadratic, ignoring poly-logarithmic factors. Conversely, the property of being the uniform distribution is indeed known to have an almost-quadratic gap. When moving to general, not necessarily label-invariant properties, the situation is more complicated, and we show some partial results. We show that if a property requires the distributions to be non-concentrated, that is, the probability mass of the distribution is sufficiently spread out, then it cannot be non-tolerantly tested with o(√n) many samples, where n denotes the universe size. Clearly, this implies at most a quadratic gap, because a distribution can be learned (and hence tolerantly tested against any property) using O(n) many samples. Being non-concentrated is a strong requirement on properties, as we also prove a close to linear lower bound against their tolerant tests. Apart from the case where the distribution is non-concentrated, we also show if an input distribution is very concentrated, in the sense that it is mostly supported on a subset of size s of the universe, then it can be learned using only O(s) many samples. The learning procedure adapts to the input, and works without knowing s in advance.

Original languageEnglish
Pages (from-to)1153-1170
Number of pages18
JournalIEEE Transactions on Information Theory
Volume71
Issue number2
DOIs
StatePublished - 2025

Keywords

  • Distribution testing
  • non-tolerant testing
  • sample complexity
  • tolerant testing

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Library and Information Sciences
  • Computer Science Applications

Cite this