Abstract
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of K possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic (ε, δ)-PAC version of the problem, with sample complexity of O((poly(K) + 1/ε2) log(|H |/δ)) for any finite hypothesis class H. In terms of the leading dependence on ε, this improves upon existing bounds for the problem, that are of the form O(K/ε2). We also provide an extension of this result to general classes and establish similar sample complexity bounds in which log |H | is replaced by the Natarajan dimension. This matches the optimal rate in the full-information version of the problem and resolves an open question studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011) who demonstrated that the multiplicative price of bandit feedback in realizable PAC learning is Θ(K). We complement this by revealing a stark contrast with the agnostic case, where the price of bandit feedback is only O(1) as ε → 0. Our algorithm utilizes a stochastic optimization technique to minimize a log-barrier potential based on Frank-Wolfe updates for computing a low-variance exploration distribution over the hypotheses, and is made computationally efficient provided access to an ERM oracle over H.
| Original language | English |
|---|---|
| Journal | Advances in Neural Information Processing Systems |
| Volume | 37 |
| State | Published - 2024 |
| Event | 38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada Duration: 9 Dec 2024 → 15 Dec 2024 |
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
- Information Systems
- Signal Processing
Fingerprint
Dive into the research topics of 'Fast Rates for Bandit PAC Multiclass Classification'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver