Abstract
Principal component analysis plays a central role in statistics, engineering, and science. Because of the prevalence of corrupted data in real-world applications, much research has focused on developing robust algorithms. Perhaps surprisingly, these algorithms are unequipped-indeed, unable-to deal with outliers in the high-dimensional setting where the number of observations is of the same magnitude as the number of variables of each observation, and the dataset contains some (arbitrarily) corrupted observations. We propose a high-dimensional robust principal component analysis algorithm that is efficient, robust to contaminated points, and easily kernelizable. In particular, our algorithm achieves maximal robustness-it has a breakdown point of 50% (the best possible), while all existing algorithms have a breakdown point of zero. Moreover, our algorithm recovers the optimal solution exactly in the case where the number of corrupted points grows sublinearly in the dimension.
Original language | English |
---|---|
Article number | 6307864 |
Pages (from-to) | 546-572 |
Number of pages | 27 |
Journal | IEEE Transactions on Information Theory |
Volume | 59 |
Issue number | 1 |
DOIs | |
State | Published - 2013 |
Keywords
- Dimension reduction
- outlier
- principal component analysis (PCA)
- robustness
- statistical learning
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Library and Information Sciences