TY - JOUR
T1 - Ranking and combining multiple predictors without labeled data
AU - Parisi, Fabio
AU - Strino, Francesco
AU - Nadler, Boaz
AU - Kluger, Yuval
N1 - American-Italian Cancer Foundation; Israeli Science Foundation; Citi Foundation; Peter T. Rowley Breast Cancer Research Projects (New York State Department of Health); National Institutes of Health Grant [R0-1 CA158167]We thank Amit Singer, Alex Kovner, Ronald Coifman, Ronen Basri, and Joseph Chang for their invaluable feedback. The Wisconsin breast cancer dataset was collected at the University of Wisconsin Hospitals by Dr. W.H. Wolberg and colleagues. F.S. is supported by the American-Italian Cancer Foundation. B.N. is supported by grants from the Israeli Science Foundation and from Citi Foundation. Y.K. is supported by the Peter T. Rowley Breast Cancer Research Projects (New York State Department of Health) and National Institutes of Health Grant R0-1 CA158167.
PY - 2014/1/28
Y1 - 2014/1/28
N2 - In a broad range of classification and decision-making problems, one is given the advice or predictions of several classifiers, of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting, where each classifier's accuracy can be assessed using available labeled data, and raises two questions: Given only the predictions of several classifiers over a large set of unlabeled test data, is it possible to (i) reliably rank them and (ii) construct a metaclassifier more accurate than most classifiers in the ensemble? Here we present a spectral approach to address these questions. First, assuming conditional independence between classifiers, we show that the offdiagonal entries of their covariance matrix correspond to a rank-one matrix. Moreover, the classifiers can be ranked using the leading eigenvector of this covariance matrix, because its entries are proportional to their balanced accuracies. Second, via a linear approximation to the maximum likelihood estimator, we derive the Spectral Meta-Learner (SML), an unsupervised ensemble classifier whose weights are equal to these eigenvector entries. On both simulated and real data, SML typically achieves a higher accuracy than most classifiers in the ensemble and can provide a better starting point than majority voting for estimating the maximum likelihood solution. Furthermore, SML is robust to the presence of small malicious groups of classifiers designed to veer the ensemble prediction away from the (unknown) ground truth.
AB - In a broad range of classification and decision-making problems, one is given the advice or predictions of several classifiers, of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting, where each classifier's accuracy can be assessed using available labeled data, and raises two questions: Given only the predictions of several classifiers over a large set of unlabeled test data, is it possible to (i) reliably rank them and (ii) construct a metaclassifier more accurate than most classifiers in the ensemble? Here we present a spectral approach to address these questions. First, assuming conditional independence between classifiers, we show that the offdiagonal entries of their covariance matrix correspond to a rank-one matrix. Moreover, the classifiers can be ranked using the leading eigenvector of this covariance matrix, because its entries are proportional to their balanced accuracies. Second, via a linear approximation to the maximum likelihood estimator, we derive the Spectral Meta-Learner (SML), an unsupervised ensemble classifier whose weights are equal to these eigenvector entries. On both simulated and real data, SML typically achieves a higher accuracy than most classifiers in the ensemble and can provide a better starting point than majority voting for estimating the maximum likelihood solution. Furthermore, SML is robust to the presence of small malicious groups of classifiers designed to veer the ensemble prediction away from the (unknown) ground truth.
UR - http://www.scopus.com/inward/record.url?scp=84893410668&partnerID=8YFLogxK
U2 - 10.1073/pnas.1219097111
DO - 10.1073/pnas.1219097111
M3 - مقالة
SN - 0027-8424
VL - 111
SP - 1253
EP - 1258
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 4
ER -