TY - GEN
T1 - VOI-aware MCTS
AU - Tolpin, David
AU - Shimony, Solomon Eyal
PY - 2012/1/1
Y1 - 2012/1/1
N2 - UCT, a state-of-the art algorithm for Monte Carlo tree search (MCTS) in games and Markov decision processes, is based on UCB1, a sampling policy for the Multi-armed Bandit problem (MAB) that minimizes the cumulative regret. However, search differs from MAB in that in MCTS it is usually only the final "arm pull" (the actual move selection) that collects a reward, rather than all "arm pulls". In this paper, an MCTS sampling policy based on Value of Information (VOI) estimates of rollouts is suggested. Empirical evaluation of the policy and comparison to UCB1 and UCT is performed on random MAB instances as well as on Computer Go.
AB - UCT, a state-of-the art algorithm for Monte Carlo tree search (MCTS) in games and Markov decision processes, is based on UCB1, a sampling policy for the Multi-armed Bandit problem (MAB) that minimizes the cumulative regret. However, search differs from MAB in that in MCTS it is usually only the final "arm pull" (the actual move selection) that collects a reward, rather than all "arm pulls". In this paper, an MCTS sampling policy based on Value of Information (VOI) estimates of rollouts is suggested. Empirical evaluation of the policy and comparison to UCB1 and UCT is performed on random MAB instances as well as on Computer Go.
UR - http://www.scopus.com/inward/record.url?scp=84878787762&partnerID=8YFLogxK
U2 - 10.3233/978-1-61499-098-7-929
DO - 10.3233/978-1-61499-098-7-929
M3 - Conference contribution
SN - 9781614990970
T3 - Frontiers in Artificial Intelligence and Applications
SP - 929
EP - 930
BT - ECAI 2012 - 20th European Conference on Artificial Intelligence, 27-31 August 2012, Montpellier, France - Including Prestigious Applications of Artificial Intelligence (PAIS-2012) System Demonstration
PB - IOS Press BV
T2 - 20th European Conference on Artificial Intelligence, ECAI 2012
Y2 - 27 August 2012 through 31 August 2012
ER -