How catastrophic can catastrophic forgetting be in linear regression?

Itay Evron, Edward Moroshko, Rachel Ward, Nati Srebro, Daniel Soudry

Research output: Contribution to journalConference articlepeer-review

Abstract

To better understand catastrophic forgetting, we study fitting an overparameterized linear model to a sequence of tasks with different input distributions. We analyze how much the model forgets the true labels of earlier tasks after training on subsequent tasks, obtaining exact expressions and bounds. We establish connections between continual learning in the linear setting and two other research areas – alternating projections and the Kaczmarz method. In specific settings, we highlight differences between forgetting and convergence to the offline solution as studied in those areas. In particular, when T tasks _in d dimensions are presented cyclically for k iterations, we prove an upper bound of T2 min{1/k, d/k} on the forgetting. This stands in contrast to the convergence to the offline solution, which can be arbitrarily slow according to existing alternating projection results. We further show that the T2 factor can be lifted when tasks are presented in a random ordering.

Original languageEnglish
Pages (from-to)4028-4079
Number of pages52
JournalProceedings of Machine Learning Research
Volume178
StatePublished - 2022
Event35th Conference on Learning Theory, COLT 2022 - London, United Kingdom
Duration: 2 Jul 20225 Jul 2022
https://proceedings.mlr.press/v178

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'How catastrophic can catastrophic forgetting be in linear regression?'. Together they form a unique fingerprint.

Cite this