TY - GEN
T1 - Direct validation of the information bottleneck principle for deep nets
AU - Elad, Adar
AU - Haviv, Doron
AU - Blau, Yochai
AU - Michaeli, Tomer
N1 - Publisher Copyright: © 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - The information bottleneck (IB) has been suggested as a fundamental principle governing performance in deep neural nets (DNNs). This idea sparked research on the information plane dynamics during training with the cross-entropy loss, and on using the IB of some 'bottleneck' layer as a loss function. However, the claim that reaching the maximal value of the IB Lagrangian in each layer leads to optimal performance, was in fact never directly confirmed. In this paper, we propose a direct way of validating this hypothesis, using layer-by-layer training with the IB loss. In accordance with the original theory, we train each DNN layer explicitly with the IB objective (and without any classification loss), and freeze it before moving on to train the next layer. While mutual information (MI) is generally hard to estimate in high dimensions, we show that in the case of MI between DNN layers, this can be done quite accurately using a modification of the recently proposed mutual information neural estimator. Interestingly, we find that layer-by-layer training with the IB loss leads to accuracy which is on-par with end-to-end training with the cross entropy loss. This is, thus, the first direct experimental illustration of the link between the IB value in each layer, and a net's performance.
AB - The information bottleneck (IB) has been suggested as a fundamental principle governing performance in deep neural nets (DNNs). This idea sparked research on the information plane dynamics during training with the cross-entropy loss, and on using the IB of some 'bottleneck' layer as a loss function. However, the claim that reaching the maximal value of the IB Lagrangian in each layer leads to optimal performance, was in fact never directly confirmed. In this paper, we propose a direct way of validating this hypothesis, using layer-by-layer training with the IB loss. In accordance with the original theory, we train each DNN layer explicitly with the IB objective (and without any classification loss), and freeze it before moving on to train the next layer. While mutual information (MI) is generally hard to estimate in high dimensions, we show that in the case of MI between DNN layers, this can be done quite accurately using a modification of the recently proposed mutual information neural estimator. Interestingly, we find that layer-by-layer training with the IB loss leads to accuracy which is on-par with end-to-end training with the cross entropy loss. This is, thus, the first direct experimental illustration of the link between the IB value in each layer, and a net's performance.
KW - Information bottleneck
KW - Information theory
KW - Theory of deep learning
UR - http://www.scopus.com/inward/record.url?scp=85082494062&partnerID=8YFLogxK
U2 - 10.1109/ICCVW.2019.00099
DO - 10.1109/ICCVW.2019.00099
M3 - منشور من مؤتمر
T3 - Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019
SP - 758
EP - 762
BT - Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019
T2 - 17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019
Y2 - 27 October 2019 through 28 October 2019
ER -