Skip to main navigation Skip to search Skip to main content

Thompson sampling for complex online problems

Aditya Gopalan, Shie Mannor, Yishay Mansour

Research output: Contribution to journalConference articlepeer-review

Abstract

2014 We consider stochastic multi-armed bandit problems with complex actions over a set of basic anus, where the decision maker plays a complex action rather than a basic arm in each round. The reward of the complex action is some function of the basic arms' rewards, and the feedback observed may not necessarily be the reward perarm. For instance, when the complex actions are subsets of the arms, we may only observe the maximum reward over the chosen subset. Thus, feedback across complex actions may be coupled due to the nature of the reward function. We prove a frequentist regret bound for Thompson sampling in a very general setting involving parameter, action and observation spaces and a likelihood function over them. The bound holds for discretely-supported priors over the parameter space without additional structural properties such as closed-form posteriors, conjugate prior structure or independence across arms. The regret bound scales logarithmically with time but, more importantly, with an improved constant that non-trivially captures the coupling across complex actions due to the structure of the rewards. As applications, we derive improved regret bounds for classes of complex bandit problems involving selecting subsets of arms, including the first nontrivial regret bounds for nonlinear MAX reward feedback from subsets. Using particle filters for computing posterior distributions which lack an explicit closed-form, we present numerical results for the performance of Thompson sampling for subset-selection and job scheduling problems.

Original languageEnglish
Pages (from-to)169-186
Number of pages18
Journal31st International Conference on Machine Learning, ICML 2014
StatePublished - 2014
Event31st International Conference on Machine Learning, ICML 2014 - Beijing, China
Duration: 21 Jun 201426 Jun 2014

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Thompson sampling for complex online problems'. Together they form a unique fingerprint.

Cite this