To interpret or not to interpret PCA? This is our question

Dan Vilenchik, Barak Yichye, Maor Abutbul

Research output: Contribution to conferencePaperpeer-review

Abstract

Principal Component Analysis (PCA) is a central tool for analyzing data and social media data in particular. Typically, the data is projected on the first two PCs to obtain a two-dimensional view, and trends and patterns are being examined. A key to making sense of the projected data is the semantic interpretation of the new axes (the PCs). To label the PCs, one usually looks at the top k vector entries in absolute value and assigns meaning according to them. The choice of k is done by “eyeballing” the vector. In this work we provide a computational framework to support this process and suggest an interpretability score, which measures how sensitive the interpretation step could be to the choice of k. Furthermore we give a visual method to choose the optimal k. We study our methodology in four social media platforms and discover that in two of them, Twitter and Instagram, interpretation can be done in a carefree manner, but in Steam and LinkedIn there is no natural labeling of the axes. This separation is clearly reflected in the interpretability score that each dataset received.

Original languageAmerican English
Pages655-658
Number of pages4
StatePublished - 1 Jan 2019
Event13th International Conference on Web and Social Media, ICWSM 2019 - Munich, Germany
Duration: 11 Jun 201914 Jun 2019

Conference

Conference13th International Conference on Web and Social Media, ICWSM 2019
Country/TerritoryGermany
CityMunich
Period11/06/1914/06/19

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications

Cite this