TY - GEN
T1 - Value iteration networks
AU - Tamar, Aviv
AU - Wu, Yi
AU - Thomas, Garrett
AU - Levine, Sergey
AU - Abbeel, Pieter
N1 - Funding Information: Research partially funded by Siemens, ONR PECASE award, Army Research Office through the MAST program, and an NSF CAREER award (#1351028). A. T. partially funded by the Viterbi Scholarship, Technion. Y. W. partially funded by a DARPA PPAML program, contract FA8750-14-C-0011.
PY - 2017
Y1 - 2017
N2 - We introduce the value iteration network (VIN): a fully differentiable neural network with a 'planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation. We evaluate VIN based policies on discrete and continuous path-planning domains, and on a natural-language based search task. We show that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains. This paper is a significantly abridged and IJCAI audience targeted version of the original NIPS 2016 paper with the same title, available here: https://arxiv.org/abs/1602.02867.
AB - We introduce the value iteration network (VIN): a fully differentiable neural network with a 'planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation. We evaluate VIN based policies on discrete and continuous path-planning domains, and on a natural-language based search task. We show that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains. This paper is a significantly abridged and IJCAI audience targeted version of the original NIPS 2016 paper with the same title, available here: https://arxiv.org/abs/1602.02867.
UR - http://www.scopus.com/inward/record.url?scp=85031930412&partnerID=8YFLogxK
U2 - 10.24963/ijcai.2017/700
DO - 10.24963/ijcai.2017/700
M3 - منشور من مؤتمر
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 4949
EP - 4953
BT - 26th International Joint Conference on Artificial Intelligence, IJCAI 2017
A2 - Sierra, Carles
T2 - 26th International Joint Conference on Artificial Intelligence, IJCAI 2017
Y2 - 19 August 2017 through 25 August 2017
ER -