TY - JOUR
T1 - Online learning of noisy data
AU - Cesa-Bianchi, Nicoló
AU - Shalev-Shwartz, Shai
AU - Shamir, Ohad
N1 - Funding Information: Manuscript received September 02, 2010; revised December 29, 2010; accepted July 08, 2011. Date of publication September 08, 2011; date of current version December 07, 2011. The material in this paper was presented at the COLT 2010 conference. This work was supported in part by the Israeli Science Foundation under Grant 590-10 and in part by the PASCAL2 Network of Excellence under EC Grant 216886.
PY - 2011/12
Y1 - 2011/12
N2 - We study online learning of linear and kernel-based predictors, when individual examples are corrupted by random noise, and both examples and noise type can be chosen adversarially and change over time. We begin with the setting where some auxiliary information on the noise distribution is provided, and we wish to learn predictors with respect to the squared loss. Depending on the auxiliary information, we show how one can learn linear and kernel-based predictors, using just 1 or 2 noisy copies of each example. We then turn to discuss a general setting where virtually nothing is known about the noise distribution, and one wishes to learn with respect to general losses and using linear and kernel-based predictors. We show how this can be achieved using a random, essentially constant number of noisy copies of each example. Allowing multiple copies cannot be avoided: Indeed, we show that the setting becomes impossible when only one noisy copy of each instance can be accessed. To obtain our results we introduce several novel techniques, some of which might be of independent interest.
AB - We study online learning of linear and kernel-based predictors, when individual examples are corrupted by random noise, and both examples and noise type can be chosen adversarially and change over time. We begin with the setting where some auxiliary information on the noise distribution is provided, and we wish to learn predictors with respect to the squared loss. Depending on the auxiliary information, we show how one can learn linear and kernel-based predictors, using just 1 or 2 noisy copies of each example. We then turn to discuss a general setting where virtually nothing is known about the noise distribution, and one wishes to learn with respect to general losses and using linear and kernel-based predictors. We show how this can be achieved using a random, essentially constant number of noisy copies of each example. Allowing multiple copies cannot be avoided: Indeed, we show that the setting becomes impossible when only one noisy copy of each instance can be accessed. To obtain our results we introduce several novel techniques, some of which might be of independent interest.
UR - http://www.scopus.com/inward/record.url?scp=83255166616&partnerID=8YFLogxK
U2 - 10.1109/TIT.2011.2164053
DO - 10.1109/TIT.2011.2164053
M3 - مقالة
SN - 0018-9448
VL - 57
SP - 7907
EP - 7931
JO - IEEE Transactions on Information Theory
JF - IEEE Transactions on Information Theory
IS - 12
M1 - 6015553
ER -