Unsupervised discovery of non-trivial similarities between online communities

Abraham Israeli, Shani Cohen, Oren Tsur

Research output: Contribution to journalArticlepeer-review


Language is used differently across communities. The differences may be manifested in vocabulary, style, and semantics. These differences enable the exploration of nuanced similarities and differences between communities. In this work, we introduce C3 — a novel unsupervised approach for community comparison. C3 creates contextual pairwise representations by aligning communities and tuning word embeddings according to both the lexical context and the social context reflected by the community's structure and the community engagement patterns. Specifically, C3 takes into account the semantic relations between pairs of words, reflected by the embeddings model of each community, and leverages the social context and users’ role in their community to calculate a similarity measure between community pairs. C3 is evaluated over a dataset of 1565 active Reddit communities, comparing results against three competitive models. We show through an array of experiments and validations that C3 recovers nuanced and not-trivial similarities between communities that are not captured by any of the competitive models. We complement the quantitative results with a qualitative analysis, discussing recovered non-trivial similarities between community pairs such as: opiates and adhd, babyBumps and depression, wallStreetBets and sandersForPresident, all of which are recovered by C3 but not by any of the other models. This qualitative analysis demonstrates the exploratory power of our model.

Original languageAmerican English
Article number117900
JournalExpert Systems with Applications
StatePublished - 15 Nov 2022


  • Computational social science
  • Machine learning
  • Natural language processing
  • Online communities
  • Social network analysis
  • Word embeddings

All Science Journal Classification (ASJC) codes

  • General Engineering
  • Artificial Intelligence
  • Computer Science Applications


Dive into the research topics of 'Unsupervised discovery of non-trivial similarities between online communities'. Together they form a unique fingerprint.

Cite this