TY - JOUR
T1 - Projection pursuit in high dimensions
AU - Bickel, Peter J.
AU - Kur, Gil
AU - Nadler, Boaz
N1 - We thank Trevor Hastie, Sourav Chatterjee, Gilles Blanchard, and David Donoho for constructive suggestions. P.J.B. is supported by NSF Grant DMS-1713082. B.N. is supported by a grant from the US–Israel Binational Science Foundation. B.N. is an incumbent of the William Petschek professorial chair of mathematics. P.J.B., G.K., and B.N. designed research, performed research, analyzed data, and wrote the paper.
PY - 2018/9/11
Y1 - 2018/9/11
N2 - Projection pursuit is a classical exploratory data analysis method to detect interesting low-dimensional structures in multivariate data. Originally, projection pursuit was applied mostly to data of moderately low dimension. Motivated by contemporary applications, we here study its properties in high-dimensional settings. Specifically, we analyze the asymptotic properties of projection pursuit on structureless multivariate Gaussian data with an identity covariance, as both dimension p and sample size n tend to infinity, with p/n → γ ∈ [0, ∞]. Our main results are that (i) if γ = ∞, then there exist projections whose corresponding empirical cumulative distribution function can approximate any arbitrary distribution; and (ii) if γ ∈ (0, ∞), not all limiting distributions are possible. However, depending on the value of γ, various non-Gaussian distributions may still be approximated. In contrast, if we restrict to sparse projections, involving only a few of the p variables, then asymptotically all empirical cumulative distribution functions are Gaussian. And (iii) if γ = 0, then asymptotically all projections are Gaussian. Some of these results extend to mean-centered sub-Gaussian data and to projections into k dimensions. Hence, in the “small n, large p” setting, unless sparsity is enforced, and regardless of the chosen projection index, projection pursuit may detect an apparent structure that has no statistical significance. Furthermore, our work reveals fundamental limitations on the ability to detect non-Gaussian signals in high-dimensional data, in particular through independent component analysis and related non-Gaussian component analysis.
AB - Projection pursuit is a classical exploratory data analysis method to detect interesting low-dimensional structures in multivariate data. Originally, projection pursuit was applied mostly to data of moderately low dimension. Motivated by contemporary applications, we here study its properties in high-dimensional settings. Specifically, we analyze the asymptotic properties of projection pursuit on structureless multivariate Gaussian data with an identity covariance, as both dimension p and sample size n tend to infinity, with p/n → γ ∈ [0, ∞]. Our main results are that (i) if γ = ∞, then there exist projections whose corresponding empirical cumulative distribution function can approximate any arbitrary distribution; and (ii) if γ ∈ (0, ∞), not all limiting distributions are possible. However, depending on the value of γ, various non-Gaussian distributions may still be approximated. In contrast, if we restrict to sparse projections, involving only a few of the p variables, then asymptotically all empirical cumulative distribution functions are Gaussian. And (iii) if γ = 0, then asymptotically all projections are Gaussian. Some of these results extend to mean-centered sub-Gaussian data and to projections into k dimensions. Hence, in the “small n, large p” setting, unless sparsity is enforced, and regardless of the chosen projection index, projection pursuit may detect an apparent structure that has no statistical significance. Furthermore, our work reveals fundamental limitations on the ability to detect non-Gaussian signals in high-dimensional data, in particular through independent component analysis and related non-Gaussian component analysis.
U2 - 10.1073/pnas.1801177115
DO - 10.1073/pnas.1801177115
M3 - مقالة
SN - 0027-8424
VL - 115
SP - 9151
EP - 9156
JO - PNAS
JF - PNAS
IS - 37
ER -