An algorithm for the principal component analysis of large data sets

Nathan Halko, Per Gunnar Martinsson, Yoel Shkolnisky, Mark Tygert

Research output: Contribution to journalArticlepeer-review

Abstract

Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy-even on parallel processors-unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure works efficiently out-of-core.) We illustrate the performance of the algorithm via several numerical examples. For example, we report on the PCA of a data set stored on disk that is so large that less than a hundredth of it can fit in our computer's RAM.

Original languageEnglish
Pages (from-to)2580-2594
Number of pages15
JournalSIAM Journal on Scientific Computing
Volume33
Issue number5
DOIs
StatePublished - 2011

Keywords

  • Algorithm
  • Low rank
  • PCA
  • Principal component analysis
  • SVD
  • Singular value decomposition

All Science Journal Classification (ASJC) codes

  • Computational Mathematics
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'An algorithm for the principal component analysis of large data sets'. Together they form a unique fingerprint.

Cite this