Brief Announcement: Gradual Learning of Deep Recurrent Neural Network

Ziv Aharoni, Gal Rattner, Haim Permuter

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deep Recurrent Neural Networks (RNNs) achieve state-of-the-art results in many sequence-to-sequence modeling tasks. However, deep RNNs are difficult to train and tend to suffer from overfitting. Motivated by the Data Processing Inequality (DPI) we formulate the multi-layered network as a Markov chain, introducing a training method that comprises training the network gradually and using layer-wise gradient clipping. In total, we have found that applying our methods combined with previously introduced regularization and optimization methods resulted in improvement to the state-of-the-art architectures operating in language modeling tasks.

Original languageAmerican English
Title of host publicationCyber Security Cryptography and Machine Learning - Second International Symposium, CSCML 2018, Proceedings
EditorsItai Dinur, Shlomi Dolev, Sachin Lodha
PublisherSpringer Verlag
Pages274-277
Number of pages4
ISBN (Electronic)978-3-319-94147-9
ISBN (Print)978-3-319-94146-2
DOIs
StatePublished - 17 Jun 2018
Event2nd International Symposium on Cyber Security Cryptography and Machine Learning, CSCML 2018 - Beer-Sheva, Israel
Duration: 21 Jun 201822 Jun 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10879 LNCS

Conference

Conference2nd International Symposium on Cyber Security Cryptography and Machine Learning, CSCML 2018
Country/TerritoryIsrael
CityBeer-Sheva
Period21/06/1822/06/18

Keywords

  • Data-processing-inequality
  • Machine-learning
  • Recurrent-neural-networks
  • Regularization
  • Training-methods

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Brief Announcement: Gradual Learning of Deep Recurrent Neural Network'. Together they form a unique fingerprint.

Cite this