Neural text embeddings in psychological research: A guide with examples in R.

Research output: Contribution to journalArticlepeer-review

Abstract

In this guide, we review neural embedding models and compare three methods of quantifying psychological constructs for use with embeddings: distributed dictionary representation, contextualized construct representation, and a novel approach: correlational anchored vectors. We aim to cultivate an intuition for the geometric properties of neural embeddings and a sensitivity to methodological problems that can arise in their use. We argue that while large language model embeddings have the advantage of contextualization, decontextualized word embeddings may have more ability to generalize across text genres when using cosine or dot product similarity metrics. The three methods of operationalizing psychological constructs in vector space likewise each have their advantages in particular applications. We recommend distributed dictionary representation, which derives a vector representation from a word list, for quantifying abstract constructs relating to the overall feel of a text, especially when the research requires that these constructs generalize across multiple genres of text. We recommend contextualized construct representation, which derives a representation from a questionnaire, for cases in which texts are relatively similar in content to the embedded questionnaire, such as experiments in which participants are asked to respond to a related prompt. Correlational anchored vectors, which derives a representation from labeled examples, requires suitably large and reliable training data.

Original languageAmerican English
JournalPsychological Methods
DOIs
StateAccepted/In press - 1 Jan 2025

Keywords

  • R
  • large language models
  • natural language processing
  • text embeddings
  • word embeddings

All Science Journal Classification (ASJC) codes

  • Psychology (miscellaneous)

Fingerprint

Dive into the research topics of 'Neural text embeddings in psychological research: A guide with examples in R.'. Together they form a unique fingerprint.

Cite this