Outlier-robust PCA: The high-dimensional case

Huan Xu, Constantine Caramanis, Shie Mannor

Research output: Contribution to journalArticlepeer-review

Abstract

Principal component analysis plays a central role in statistics, engineering, and science. Because of the prevalence of corrupted data in real-world applications, much research has focused on developing robust algorithms. Perhaps surprisingly, these algorithms are unequipped-indeed, unable-to deal with outliers in the high-dimensional setting where the number of observations is of the same magnitude as the number of variables of each observation, and the dataset contains some (arbitrarily) corrupted observations. We propose a high-dimensional robust principal component analysis algorithm that is efficient, robust to contaminated points, and easily kernelizable. In particular, our algorithm achieves maximal robustness-it has a breakdown point of 50% (the best possible), while all existing algorithms have a breakdown point of zero. Moreover, our algorithm recovers the optimal solution exactly in the case where the number of corrupted points grows sublinearly in the dimension.

Original languageEnglish
Article number6307864
Pages (from-to)546-572
Number of pages27
JournalIEEE Transactions on Information Theory
Volume59
Issue number1
DOIs
StatePublished - 2013

Keywords

  • Dimension reduction
  • outlier
  • principal component analysis (PCA)
  • robustness
  • statistical learning

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Outlier-robust PCA: The high-dimensional case'. Together they form a unique fingerprint.

Cite this