TY - GEN
T1 - Accuracy on the Line
T2 - 38th International Conference on Machine Learning, ICML 2021
AU - Miller, John
AU - Taori, Rohan
AU - Raghunathan, Aditi
AU - Sagawa, Shiori
AU - Koh, Pang Wei
AU - Shankar, Vaishaal
AU - Liang, Percy
AU - Carmon, Yair
AU - Schmidt, Ludwig
N1 - Publisher Copyright: Copyright © 2021 by the author(s)
PY - 2021
Y1 - 2021
N2 - For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments. In this paper, we empirically show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet, a synthetic pose estimation task derived from YCB objects, FMoW-WILDS satellite imagery classification, and wildlife classification in iWildCam-WILDS. The correlation holds across model architectures, hyperparameters, training set size, and training duration, and is more precise than what is expected from existing domain adaptation theory. To complete the picture, we also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS. Finally, we provide a candidate theory based on a Gaussian data model that shows how changes in the data covariance arising from distribution shift can affect the observed correlations.
AB - For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments. In this paper, we empirically show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet, a synthetic pose estimation task derived from YCB objects, FMoW-WILDS satellite imagery classification, and wildlife classification in iWildCam-WILDS. The correlation holds across model architectures, hyperparameters, training set size, and training duration, and is more precise than what is expected from existing domain adaptation theory. To complete the picture, we also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS. Finally, we provide a candidate theory based on a Gaussian data model that shows how changes in the data covariance arising from distribution shift can affect the observed correlations.
UR - http://www.scopus.com/inward/record.url?scp=85140253411&partnerID=8YFLogxK
M3 - منشور من مؤتمر
T3 - Proceedings of Machine Learning Research
SP - 7721
EP - 7735
BT - Proceedings of the 38th International Conference on Machine Learning, ICML 2021
PB - ML Research Press
Y2 - 18 July 2021 through 24 July 2021
ER -