TY - GEN
T1 - From bandits to experts
T2 - 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
AU - Mannor, Shie
AU - Shamir, Ohad
PY - 2011
Y1 - 2011
N2 - We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game. In addition to observing the reward of the chosen action, the decision maker gets side observations on the reward he would have obtained had he chosen some of the other actions. The observation structure is encoded as a graph, where node i is linked to node j if sampling i provides information on the reward of j. This setting naturally interpolates between the well-known "experts" setting, where the decision maker can view all rewards, and the multi-armed bandits setting, where the decision maker can only view the reward of the chosen action. We develop practical algorithms with provable regret guarantees, which depend on non-trivial graph-theoretic properties of the information feedback structure. We also provide partially-matching lower bounds.
AB - We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game. In addition to observing the reward of the chosen action, the decision maker gets side observations on the reward he would have obtained had he chosen some of the other actions. The observation structure is encoded as a graph, where node i is linked to node j if sampling i provides information on the reward of j. This setting naturally interpolates between the well-known "experts" setting, where the decision maker can view all rewards, and the multi-armed bandits setting, where the decision maker can only view the reward of the chosen action. We develop practical algorithms with provable regret guarantees, which depend on non-trivial graph-theoretic properties of the information feedback structure. We also provide partially-matching lower bounds.
UR - http://www.scopus.com/inward/record.url?scp=85162303757&partnerID=8YFLogxK
M3 - منشور من مؤتمر
SN - 9781618395993
T3 - Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
BT - Advances in Neural Information Processing Systems 24
Y2 - 12 December 2011 through 14 December 2011
ER -