Regularization learning networks: Deep learning for tabular datasets

Ira Shavitt, Eran Segal

Research output: Contribution to journalConference articlepeer-review

Abstract

Despite their impressive performance, Deep Neural Networks (DNNs) typically underperform Gradient Boosting Trees (GBTs) on many tabular-dataset learning tasks. We propose that applying a different regularization coefficient to each weight might boost the performance of DNNs by allowing them to make more use of the more relevant inputs. However, this will lead to an intractable number of hyperparameters. Here, we introduce Regularization Learning Networks (RLNs), which overcome this challenge by introducing an efficient hyperparameter tuning scheme which minimizes a new Counterfactual Loss. Our results show that RLNs significantly improve DNNs on tabular datasets, and achieve comparable results to GBTs, with the best performance achieved with an ensemble that combines GBTs and RLNs. RLNs produce extremely sparse networks, eliminating up to 99.8% of the network edges and 82% of the input features, thus providing more interpretable models and reveal the importance that the network assigns to different inputs. RLNs could efficiently learn a single network in datasets that comprise both tabular and unstructured data, such as in the setting of medical imaging accompanied by electronic health records. An open source implementation of RLN can be found at https://github.com/irashavitt/regularization_learning_networks.

Original languageEnglish
Pages (from-to)1379-1389
Number of pages11
JournalAdvances in Neural Information Processing Systems
Volume2018-December
DOIs
StatePublished - 3 Dec 2018
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: 2 Dec 20188 Dec 2018

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Regularization learning networks: Deep learning for tabular datasets'. Together they form a unique fingerprint.

Cite this