TY - JOUR
T1 - Do semidefinite relaxations solve sparse PCA up to the information limit?
AU - Krauthgamer, Robert
AU - Nadler, Boaz
AU - Vilenchik, Dan
N1 - Published at http://dx.doi.org/10.1214/15-AOS1310 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
PY - 2015/6
Y1 - 2015/6
N2 - Estimating the leading principal components of data, assuming they are sparse, is a central task in modern high-dimensional statistics. Many algorithms were developed for this sparse PCA problem, from simple diagonal thresholding to sophisticated semidefinite programming (SDP) methods. A key theoretical question is under what conditions can such algorithms recover the sparse principal components? We study this question for a single-spike model with an l(0)-sparse eigenvector, in the asymptotic regime as dimension p and sample size n both tend to infinity. Amini and Wainwright [Ann. Statist. 37 (2009) 2877-2921] proved that for sparsity levels k >= Omega (n/ log p), no algorithm, efficient or not, can reliably recover the sparse eigenvector. In contrast, for k = Omega(root n), the proposed SDP approach, at least in its standard usage, cannot recover the sparse spike. In fact, we conjecture that in the single-spike model, no computationally-efficient algorithm can recover a spike of l(0)-sparsity k >= Omega (root n). Finally, we present empirical results suggesting that up to sparsity levels k = O (root n), recovery is possible by a simple covariance thresholding algorithm.
AB - Estimating the leading principal components of data, assuming they are sparse, is a central task in modern high-dimensional statistics. Many algorithms were developed for this sparse PCA problem, from simple diagonal thresholding to sophisticated semidefinite programming (SDP) methods. A key theoretical question is under what conditions can such algorithms recover the sparse principal components? We study this question for a single-spike model with an l(0)-sparse eigenvector, in the asymptotic regime as dimension p and sample size n both tend to infinity. Amini and Wainwright [Ann. Statist. 37 (2009) 2877-2921] proved that for sparsity levels k >= Omega (n/ log p), no algorithm, efficient or not, can reliably recover the sparse eigenvector. In contrast, for k = Omega(root n), the proposed SDP approach, at least in its standard usage, cannot recover the sparse spike. In fact, we conjecture that in the single-spike model, no computationally-efficient algorithm can recover a spike of l(0)-sparsity k >= Omega (root n). Finally, we present empirical results suggesting that up to sparsity levels k = O (root n), recovery is possible by a simple covariance thresholding algorithm.
KW - math.ST
KW - stat.ML
KW - stat.TH
U2 - https://doi.org/10.1214/15-AOS1310
DO - https://doi.org/10.1214/15-AOS1310
M3 - مقالة
SN - 0090-5364
VL - 43
SP - 1300
EP - 1322
JO - Annals of Statistics
JF - Annals of Statistics
IS - 3
ER -