TY - GEN
T1 - Reviving and improving recurrent back-propagation
AU - Liao, Renjie
AU - Xiong, Yuwen
AU - Fetaya, Ethan
AU - Zhang, Lisa
AU - Yoon, Ki Jung
AU - Pitkow, Xaq
AU - Urtasun, Raquel
AU - Zemel, Richard
N1 - Publisher Copyright: © Copyright 2018 by the author(s).
PY - 2018
Y1 - 2018
N2 - In this paper, we revisit the recurrent back- propagation (RBP) algorithm (Almeida, 1987; Pineda, 1987), discuss the conditions under which it applies as well as how to satisfy them in deep neural networks. We show that RBP can be unstable and propose two variants based on conjugate gradient on the normal equations (CG-RBP) and Neumann series (Neumann-RBP). We further in-vestigate the relationship between Neumann-RBP and back propagation through time (BPTT) and its truncated version (TBPTT). Our Neumann- RBP has the same time complexity as TBPTT but only requires constant memory, whereas TBPTT's memory cost scales linearly with the number of truncation steps. We examine all RBP variants, along with BPTT and TBPTT, in three different application domains: Associative memory with continuous Hopfield networks, document classifi-cation in citation networks using graph neural networks, and hyperparameter optimization for fully connected networks. All experiments demonstrate that RBPs, especially the Neumann-RBP variant, are efficient and effective for optimizing convergent recurrent neural networks.
AB - In this paper, we revisit the recurrent back- propagation (RBP) algorithm (Almeida, 1987; Pineda, 1987), discuss the conditions under which it applies as well as how to satisfy them in deep neural networks. We show that RBP can be unstable and propose two variants based on conjugate gradient on the normal equations (CG-RBP) and Neumann series (Neumann-RBP). We further in-vestigate the relationship between Neumann-RBP and back propagation through time (BPTT) and its truncated version (TBPTT). Our Neumann- RBP has the same time complexity as TBPTT but only requires constant memory, whereas TBPTT's memory cost scales linearly with the number of truncation steps. We examine all RBP variants, along with BPTT and TBPTT, in three different application domains: Associative memory with continuous Hopfield networks, document classifi-cation in citation networks using graph neural networks, and hyperparameter optimization for fully connected networks. All experiments demonstrate that RBPs, especially the Neumann-RBP variant, are efficient and effective for optimizing convergent recurrent neural networks.
UR - http://www.scopus.com/inward/record.url?scp=85057223497&partnerID=8YFLogxK
M3 - منشور من مؤتمر
T3 - 35th International Conference on Machine Learning, ICML 2018
SP - 4807
EP - 4820
BT - 35th International Conference on Machine Learning, ICML 2018
A2 - Dy, Jennifer
A2 - Krause, Andreas
T2 - 35th International Conference on Machine Learning, ICML 2018
Y2 - 10 July 2018 through 15 July 2018
ER -