TY - GEN
T1 - Highway State Gating for Recurrent Highway Networks
T2 - 2nd International Symposium on Cyber Security Cryptography and Machine Learning, CSCML 2018
AU - Shoham, Ron
AU - Permuter, Haim
N1 - Publisher Copyright: © 2018, Springer International Publishing AG, part of Springer Nature.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Recurrent Neural Networks (RNNs) play a major role in the field of sequential learning, and have outperformed traditional algorithms on many benchmarks. Training deep RNNs still remains a challenge, and most of the state-of-the-art models are structured with a transition depth of 2–4 layers. Recurrent Highway Networks (RHNs) were introduced in order to tackle this issue. These have achieved state-of-the-art performance on a few benchmarks using a depth of 10 layers. However, the performance of this architecture suffers from a bottleneck, and ceases to improve when an attempt is made to add more layers. In this work, we analyze the causes for this, and postulate that the main source is the way that the information flows through time. We introduce a novel and simple variation for the RHN cell, called Highway State Gating (HSG), which allows adding more layers, while continuing to improve performance. By using a gating mechanism for the state, we allow the net to “choose” whether to pass information directly through time, or to gate it. This mechanism also allows the gradient to back-propagate directly through time and, therefore, results in a slightly faster convergence. We use the Penn Treebank (PTB) dataset as a platform for empirical proof of concept. Empirical results show that the improvement due to Highway State Gating is for all depths, and as the depth increases, the improvement also increases.
AB - Recurrent Neural Networks (RNNs) play a major role in the field of sequential learning, and have outperformed traditional algorithms on many benchmarks. Training deep RNNs still remains a challenge, and most of the state-of-the-art models are structured with a transition depth of 2–4 layers. Recurrent Highway Networks (RHNs) were introduced in order to tackle this issue. These have achieved state-of-the-art performance on a few benchmarks using a depth of 10 layers. However, the performance of this architecture suffers from a bottleneck, and ceases to improve when an attempt is made to add more layers. In this work, we analyze the causes for this, and postulate that the main source is the way that the information flows through time. We introduce a novel and simple variation for the RHN cell, called Highway State Gating (HSG), which allows adding more layers, while continuing to improve performance. By using a gating mechanism for the state, we allow the net to “choose” whether to pass information directly through time, or to gate it. This mechanism also allows the gradient to back-propagate directly through time and, therefore, results in a slightly faster convergence. We use the Penn Treebank (PTB) dataset as a platform for empirical proof of concept. Empirical results show that the improvement due to Highway State Gating is for all depths, and as the depth increases, the improvement also increases.
KW - Deep learning
KW - Machine learning
KW - Recurrent Highway Network
KW - Recurrent Neural Networks
KW - Sequential learning
UR - http://www.scopus.com/inward/record.url?scp=85049013116&partnerID=8YFLogxK
U2 - https://doi.org/10.1007/978-3-319-94147-9_10
DO - https://doi.org/10.1007/978-3-319-94147-9_10
M3 - Conference contribution
SN - 9783319941462
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 120
EP - 128
BT - Cyber Security Cryptography and Machine Learning - Second International Symposium, CSCML 2018, Proceedings
A2 - Dinur, Itai
A2 - Dolev, Shlomi
A2 - Lodha, Sachin
PB - Springer Verlag
Y2 - 21 June 2018 through 22 June 2018
ER -