TY - JOUR
T1 - Multidimensional scaling of noisy high dimensional data
AU - Peterfreund, Erez
AU - Gavish, Matan
N1 - Funding Information: The authors thank Zohar Yachini and Shay Ben Elazar for fascinating discussions on the applications of MDS. This work has been supported in part by ICRC Blavatnik Interdisciplinary Research Center ( Tel Aviv University ), Federmann Research Center ( Hebrew University ), Israel Science Foundation research grant 1523/16 and German-Israeli Foundation for Scientific Research and Development (GIF) Program no. I-1100-407.1-2015. Publisher Copyright: © 2020 The Authors
PY - 2021/3
Y1 - 2021/3
N2 - Multidimensional Scaling (MDS) is a classical technique for embedding data in low dimensions, still in widespread use today. In this paper we study MDS in a modern setting - specifically, high dimensions and ambient measurement noise. We show that as the ambient noise level increases, MDS suffers a sharp breakdown that depends on the data dimension and noise level, and derive an explicit formula for this breakdown point in the case of white noise. We then introduce MDS+, a simple variant of MDS, which applies a shrinkage nonlinearity to the eigenvalues of the MDS similarity matrix. Under a natural loss function measuring the embedding quality, we prove that MDS+ is the unique, asymptotically optimal shrinkage function. MDS+ offers improved embedding, sometimes significantly so, compared with MDS. Importantly, MDS+ calculates the optimal embedding dimension, into which the data should be embedded.
AB - Multidimensional Scaling (MDS) is a classical technique for embedding data in low dimensions, still in widespread use today. In this paper we study MDS in a modern setting - specifically, high dimensions and ambient measurement noise. We show that as the ambient noise level increases, MDS suffers a sharp breakdown that depends on the data dimension and noise level, and derive an explicit formula for this breakdown point in the case of white noise. We then introduce MDS+, a simple variant of MDS, which applies a shrinkage nonlinearity to the eigenvalues of the MDS similarity matrix. Under a natural loss function measuring the embedding quality, we prove that MDS+ is the unique, asymptotically optimal shrinkage function. MDS+ offers improved embedding, sometimes significantly so, compared with MDS. Importantly, MDS+ calculates the optimal embedding dimension, into which the data should be embedded.
KW - Dimensionality reduction
KW - Euclidean embedding
KW - MDS+
KW - Multidimensional scaling
KW - Optimal shrinkage
KW - Singular value thresholding
UR - http://www.scopus.com/inward/record.url?scp=85099221314&partnerID=8YFLogxK
U2 - https://doi.org/10.1016/j.acha.2020.11.006
DO - https://doi.org/10.1016/j.acha.2020.11.006
M3 - Article
SN - 1063-5203
VL - 51
SP - 333
EP - 373
JO - Applied and Computational Harmonic Analysis
JF - Applied and Computational Harmonic Analysis
ER -