TY - GEN
T1 - Risk Estimation with Active Labeling
AU - Magnani, Alessandro
AU - Arcaute, Esteban
AU - Mannor, Shie
N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - We consider a setting where, for a given model, with a given labeling budget, we need to accurately evaluate its risk repeatedly, when the set of items considered for the risk evaluation, as well as the loss function, change over time. This is a natural setting in a non-stationary environment, such as that of large retail chains, whose catalogue changes year-round. Since evaluating risk often requires human judgement, the cost can increase dramatically over time. We propose a new estimator that minimize the labeling cost by reusing all available labels when possible and by actively selecting items to be labeled in an optimal way. We show that an optimal sampling profile can be derived, efficiently and at scale, as the solution of an optimization problem. We show how this approach with only a small added computational and storage cost, can efficiently reduce the labeling work required to measure the risk of a model in a non-stationary environment in a production system. We extend these results to the Fα measure and weighted risks. The presented approach is related to the Horvitz-Thompson estimator, importance sampling and active learning and provides a scalable, robust solution for risk evaluation in non-stationary environment that cannot be achieved with either Horvitz-Thompson estimator nor importance sampling.
AB - We consider a setting where, for a given model, with a given labeling budget, we need to accurately evaluate its risk repeatedly, when the set of items considered for the risk evaluation, as well as the loss function, change over time. This is a natural setting in a non-stationary environment, such as that of large retail chains, whose catalogue changes year-round. Since evaluating risk often requires human judgement, the cost can increase dramatically over time. We propose a new estimator that minimize the labeling cost by reusing all available labels when possible and by actively selecting items to be labeled in an optimal way. We show that an optimal sampling profile can be derived, efficiently and at scale, as the solution of an optimization problem. We show how this approach with only a small added computational and storage cost, can efficiently reduce the labeling work required to measure the risk of a model in a non-stationary environment in a production system. We extend these results to the Fα measure and weighted risks. The presented approach is related to the Horvitz-Thompson estimator, importance sampling and active learning and provides a scalable, robust solution for risk evaluation in non-stationary environment that cannot be achieved with either Horvitz-Thompson estimator nor importance sampling.
UR - http://www.scopus.com/inward/record.url?scp=105006902764&partnerID=8YFLogxK
U2 - 10.1007/978-981-96-6400-9_9
DO - 10.1007/978-981-96-6400-9_9
M3 - منشور من مؤتمر
SN - 9789819663996
T3 - Communications in Computer and Information Science
SP - 103
EP - 120
BT - Machine Learning and Soft Computing - 9th International Conference, ICMLSC 2025, Revised Selected Papers
A2 - Huang, Letian
PB - Springer Science and Business Media Deutschland GmbH
T2 - 9th International Conference on Machine Learning and Soft Computing, ICMLSC 2025
Y2 - 24 January 2025 through 26 January 2025
ER -