Sign Based Derivative Filtering for Stochastic Gradient Descent

Konstantin Berestizshevsky, Guy Even

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We study the performance of stochastic gradient descent (SGD) in deep neural network (DNN) models. We show that during a single training epoch the signs of the partial derivatives of the loss with respect to a single parameter are distributed almost uniformly over the minibatches. We propose an optimization routine, where we maintain a moving average history of the sign of each derivative. This history is used to classify new derivatives as “exploratory” if they disagree with the sign of the history. Conversely, we classify the new derivatives as “exploiting” if they agree with the sign of the history. Each derivative is weighed according to our classification, providing control over exploration and exploitation. The proposed approach leads to training a model with higher accuracy as we demonstrate through a series of experiments.

Original languageEnglish
Title of host publicationArtificial Neural Networks and Machine Learning – ICANN 2019
Subtitle of host publicationDeep Learning - 28th International Conference on Artificial Neural Networks, Proceedings
EditorsIgor V. Tetko, Pavel Karpov, Fabian Theis, Vera Kurková
Pages208-219
Number of pages12
DOIs
StatePublished - 2019
Event28th International Conference on Artificial Neural Networks, ICANN 2019 - Munich, Germany
Duration: 17 Sep 201919 Sep 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11728 LNCS

Conference

Conference28th International Conference on Artificial Neural Networks, ICANN 2019
Country/TerritoryGermany
CityMunich
Period17/09/1919/09/19

Keywords

  • Deep learning
  • Gradients
  • Neural networks
  • Optimization

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Sign Based Derivative Filtering for Stochastic Gradient Descent'. Together they form a unique fingerprint.

Cite this