TY - GEN
T1 - From External to Swap Regret 2.0
T2 - 56th Annual ACM Symposium on Theory of Computing, STOC 2024
AU - Dagan, Yuval
AU - Daskalakis, Constantinos
AU - Fishelson, Maxwell
AU - Golowich, Noah
N1 - Publisher Copyright: © 2024 Owner/Author.
PY - 2024/6/10
Y1 - 2024/6/10
N2 - We provide a novel reduction from swap-regret minimization to external-regret minimization, which improves upon the classical reductions of Blum-Mansour and Stoltz-Lugosi in that it does not require finiteness of the space of actions. We show that, whenever there exists a no-external-regret algorithm for some hypothesis class, there must also exist a no-swap-regret algorithm for that same class. For the problem of learning with expert advice, our result implies that it is possible to guarantee that the swap regret is bounded by ϵ after (logN)Õ(1/ϵ) rounds and with O(N) per iteration complexity, where N is the number of experts, while the classical reductions of Blum-Mansour and Stoltz-Lugosi require at least ω(N/ϵ2) rounds and at least ω(N3) total computational cost. Our result comes with an associated lower bound, which - in contrast to that of Blum-Mansour - holds for oblivious and ℓ1-constrained adversaries and learners that can employ distributions over experts, showing that the number of rounds must be ω(N/ϵ2) or exponential in 1/ϵ. Our reduction implies that, if no-regret learning is possible in some game, then this game must have approximate correlated equilibria, of arbitrarily good approximation. This strengthens the folklore implication of no-regret learning that approximate coarse correlated equilibria exist. Importantly, it provides a sufficient condition for the existence of approximate correlated equilibrium which vastly extends the requirement that the action set is finite or the requirement that the action set is compact and the utility functions are continuous, allowing for games with finite Littlestone or finite sequential fat shattering dimension, thus answering a question left open in "Fast rates for nonparametric online learning: from realizability to learning in games"and "Online learning and solving infinite games with an ERM oracle". Moreover, it answers several outstanding questions about equilibrium computation and/or learning in games. In particular, for constant values of ϵ: (a) we show that ϵ-approximate correlated equilibria in extensive-form games can be computed efficiently, advancing a long-standing open problem for extensive-form games; see e.g. "Extensive-form correlated equilibrium: Definition and computational complexity"and "Polynomial-Time Linear-Swap Regret Minimization in Imperfect-Information Sequential Games"; (b) we show that the query and communication complexities of computing ϵ-approximate correlated equilibria in N-action normal-form games are N · poly log(N) and poly logN respectively, advancing an open problem of "Informational Bounds on Equilibria"; (c) we show that ϵ-approximate correlated equilibria of sparsity poly logN can be computed efficiently, advancing an open problem of "Simple Approximate Equilibria in Large Games"; (d) finally, we show that in the adversarial bandit setting, sublinear swap regret can be achieved in only Õ(N) rounds, advancing an open problem of "From External to Internal Regret"and "Tight Lower Bound and Efficient Reduction for Swap Regret".
AB - We provide a novel reduction from swap-regret minimization to external-regret minimization, which improves upon the classical reductions of Blum-Mansour and Stoltz-Lugosi in that it does not require finiteness of the space of actions. We show that, whenever there exists a no-external-regret algorithm for some hypothesis class, there must also exist a no-swap-regret algorithm for that same class. For the problem of learning with expert advice, our result implies that it is possible to guarantee that the swap regret is bounded by ϵ after (logN)Õ(1/ϵ) rounds and with O(N) per iteration complexity, where N is the number of experts, while the classical reductions of Blum-Mansour and Stoltz-Lugosi require at least ω(N/ϵ2) rounds and at least ω(N3) total computational cost. Our result comes with an associated lower bound, which - in contrast to that of Blum-Mansour - holds for oblivious and ℓ1-constrained adversaries and learners that can employ distributions over experts, showing that the number of rounds must be ω(N/ϵ2) or exponential in 1/ϵ. Our reduction implies that, if no-regret learning is possible in some game, then this game must have approximate correlated equilibria, of arbitrarily good approximation. This strengthens the folklore implication of no-regret learning that approximate coarse correlated equilibria exist. Importantly, it provides a sufficient condition for the existence of approximate correlated equilibrium which vastly extends the requirement that the action set is finite or the requirement that the action set is compact and the utility functions are continuous, allowing for games with finite Littlestone or finite sequential fat shattering dimension, thus answering a question left open in "Fast rates for nonparametric online learning: from realizability to learning in games"and "Online learning and solving infinite games with an ERM oracle". Moreover, it answers several outstanding questions about equilibrium computation and/or learning in games. In particular, for constant values of ϵ: (a) we show that ϵ-approximate correlated equilibria in extensive-form games can be computed efficiently, advancing a long-standing open problem for extensive-form games; see e.g. "Extensive-form correlated equilibrium: Definition and computational complexity"and "Polynomial-Time Linear-Swap Regret Minimization in Imperfect-Information Sequential Games"; (b) we show that the query and communication complexities of computing ϵ-approximate correlated equilibria in N-action normal-form games are N · poly log(N) and poly logN respectively, advancing an open problem of "Informational Bounds on Equilibria"; (c) we show that ϵ-approximate correlated equilibria of sparsity poly logN can be computed efficiently, advancing an open problem of "Simple Approximate Equilibria in Large Games"; (d) finally, we show that in the adversarial bandit setting, sublinear swap regret can be achieved in only Õ(N) rounds, advancing an open problem of "From External to Internal Regret"and "Tight Lower Bound and Efficient Reduction for Swap Regret".
KW - correlated equilibrium
KW - large action space
KW - swap regret
UR - http://www.scopus.com/inward/record.url?scp=85196633980&partnerID=8YFLogxK
U2 - 10.1145/3618260.3649681
DO - 10.1145/3618260.3649681
M3 - منشور من مؤتمر
T3 - Proceedings of the Annual ACM Symposium on Theory of Computing
SP - 1216
EP - 1222
BT - STOC 2024 - Proceedings of the 56th Annual ACM Symposium on Theory of Computing
A2 - Mohar, Bojan
A2 - Shinkar, Igor
A2 - O�Donnell, Ryan
PB - Association for Computing Machinery
Y2 - 24 June 2024 through 28 June 2024
ER -