TY - CHAP
T1 - Bayesian reinforcement learning
AU - Vlassis, Nikos
AU - Ghavamzadeh, Mohammad
AU - Mannor, Shie
AU - Poupart, Pascal
N1 - Publisher Copyright: © Springer-Verlag Berlin Heidelberg 2012.
PY - 2012
Y1 - 2012
N2 - This chapter surveys recent lines of work that use Bayesian techniques for reinforcement learning. In Bayesian learning, uncertainty is expressed by a prior distribution over unknown parameters and learning is achieved by computing a posterior distribution based on the data observed. Hence, Bayesian reinforcement learning distinguishes itself from other forms of reinforcement learning by explicitly maintaining a distribution over various quantities such as the parameters of the model, the value function, the policy or its gradient. This yields several benefits: a) domain knowledge can be naturally encoded in the prior distribution to speed up learning; b) the exploration/exploitation tradeoff can be naturally optimized; and c) notions of risk can be naturally taken into account to obtain robust policies.
AB - This chapter surveys recent lines of work that use Bayesian techniques for reinforcement learning. In Bayesian learning, uncertainty is expressed by a prior distribution over unknown parameters and learning is achieved by computing a posterior distribution based on the data observed. Hence, Bayesian reinforcement learning distinguishes itself from other forms of reinforcement learning by explicitly maintaining a distribution over various quantities such as the parameters of the model, the value function, the policy or its gradient. This yields several benefits: a) domain knowledge can be naturally encoded in the prior distribution to speed up learning; b) the exploration/exploitation tradeoff can be naturally optimized; and c) notions of risk can be naturally taken into account to obtain robust policies.
KW - Covariance
UR - http://www.scopus.com/inward/record.url?scp=85042936847&partnerID=8YFLogxK
U2 - https://doi.org/10.1007/978-3-642-27645-3_11
DO - https://doi.org/10.1007/978-3-642-27645-3_11
M3 - فصل
T3 - Adaptation, Learning, and Optimization
SP - 359
EP - 386
BT - Adaptation, Learning, and Optimization
ER -