Modularity-based query clustering for identifying users sharing a common condition

Maayan Harel, Elad Yom-Tov

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present an algorithm for identifying users who share a common condition from anonymized search engine logs. Input to the algorithm is a set of seed phrases that identify users with the condition of interest with high precision albeit at a very low recall. We expand the set of seed phrases by clustering queries according to the pages users clicked following these queries and the temporal ordering of queries within sessions, emphasizing the subgraph containing seed phrases. To this end, we extend modularity-based clustering such that it uses the information in the initial seed phrases as well as other queries of users in the population of interest. We evaluate the performance of the proposed method on two datasets, one of mood disorders and the other of anorexia, by classifying users according to the clusters in which they appeared and the phrases contained thereof, and show that the area under the receiver operating characteristic curve (AUC) obtained by these methods exceeds 0.87. These results demonstrate the value of our algorithm for both identifying users for future research and to gain better understanding of the language associated with the condition.

Original languageEnglish
Title of host publicationSIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
Pages819-822
Number of pages4
ISBN (Electronic)9781450336215
DOIs
StatePublished - 9 Aug 2015
Externally publishedYes
Event38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015 - Santiago, Chile
Duration: 9 Aug 201513 Aug 2015

Publication series

NameSIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015
Country/TerritoryChile
CitySantiago
Period9/08/1513/08/15

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Software

Cite this