How to Boost Any Loss Function

Richard Nock, Yishay Mansour

Research output: Contribution to journalConference articlepeer-review

Abstract

Boosting is a highly successful ML-born optimization setting in which one is required to computationally efficiently learn arbitrarily good models based on the access to a weak learner oracle, providing classifiers performing at least slightly differently from random guessing. A key difference with gradient-based optimization is that boosting's original model does not requires access to first order information about a loss, yet the decades long history of boosting has quickly evolved it into a first order optimization setting - sometimes even wrongfully defining it as such. Owing to recent progress extending gradient-based optimization to use only a loss' zeroth (0th) order information to learn, this begs the question: what loss functions can be efficiently optimized with boosting and what is the information really needed for boosting to meet the original boosting blueprint's requirements? We provide a constructive formal answer essentially showing that any loss function can be optimized with boosting and thus boosting can achieve a feat not yet known to be possible in the classical 0th order setting, since loss functions are not required to be be convex, nor differentiable or Lipschitz - and in fact not required to be continuous either. Some tools we use are rooted in quantum calculus, the mathematical field - not to be confounded with quantum computation - that studies calculus without passing to the limit, and thus without using first order information.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume37
StatePublished - 2024
Event38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada
Duration: 9 Dec 202415 Dec 2024

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'How to Boost Any Loss Function'. Together they form a unique fingerprint.

Cite this