Abstract
Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy-even on parallel processors-unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure works efficiently out-of-core.) We illustrate the performance of the algorithm via several numerical examples. For example, we report on the PCA of a data set stored on disk that is so large that less than a hundredth of it can fit in our computer's RAM.
| Original language | English |
|---|---|
| Pages (from-to) | 2580-2594 |
| Number of pages | 15 |
| Journal | SIAM Journal on Scientific Computing |
| Volume | 33 |
| Issue number | 5 |
| DOIs | |
| State | Published - 2011 |
Keywords
- Algorithm
- Low rank
- PCA
- Principal component analysis
- SVD
- Singular value decomposition
All Science Journal Classification (ASJC) codes
- Computational Mathematics
- Applied Mathematics
Fingerprint
Dive into the research topics of 'An algorithm for the principal component analysis of large data sets'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver