TY - GEN
T1 - Dueling Bandits with Team Comparisons
AU - Cohen, Lee
AU - Schmidt-Kraepelin, Ulrike
AU - Mansour, Yishay
N1 - Publisher Copyright: © 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021
Y1 - 2021
N2 - We introduce the dueling teams problem, a new online-learning setting in which the learner observes noisy comparisons of disjoint pairs of k-sized teams from a universe of n players. The goal of the learner is to minimize the number of duels required to identify, with high probability, a Condorcet winning team, i.e., a team which wins against any other disjoint team (with probability at least 1/2). Noisy comparisons are linked to a total order on the teams. We formalize our model by building upon the dueling bandits setting (Yue et al., 2012) and provide several algorithms, both for stochastic and deterministic settings. For the stochastic setting, we provide a reduction to the classical dueling bandits setting, yielding an algorithm that identifies a Condorcet winning team withinO((n+k log(k))max(log log n,log k) Δ2 ) duels, where Δ is a gap parameter. For deterministic feedback, we additionally present a gap-independent algorithm that identifies a Condorcet winning team within O(nk log(k) + k5) duels.
AB - We introduce the dueling teams problem, a new online-learning setting in which the learner observes noisy comparisons of disjoint pairs of k-sized teams from a universe of n players. The goal of the learner is to minimize the number of duels required to identify, with high probability, a Condorcet winning team, i.e., a team which wins against any other disjoint team (with probability at least 1/2). Noisy comparisons are linked to a total order on the teams. We formalize our model by building upon the dueling bandits setting (Yue et al., 2012) and provide several algorithms, both for stochastic and deterministic settings. For the stochastic setting, we provide a reduction to the classical dueling bandits setting, yielding an algorithm that identifies a Condorcet winning team withinO((n+k log(k))max(log log n,log k) Δ2 ) duels, where Δ is a gap parameter. For deterministic feedback, we additionally present a gap-independent algorithm that identifies a Condorcet winning team within O(nk log(k) + k5) duels.
UR - http://www.scopus.com/inward/record.url?scp=85132553248&partnerID=8YFLogxK
M3 - منشور من مؤتمر
T3 - Advances in Neural Information Processing Systems
SP - 20633
EP - 20644
BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
A2 - Ranzato, Marc'Aurelio
A2 - Beygelzimer, Alina
A2 - Dauphin, Yann
A2 - Liang, Percy S.
A2 - Wortman Vaughan, Jenn
PB - Neural information processing systems foundation
T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
Y2 - 6 December 2021 through 14 December 2021
ER -