Improving term weighting for community question answering search using syntactic analysis

David Carmel, Avihai Mejer, Yuval Pinter, Idan Szpektor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Query term weighting is a fundamental task in information retrieval and most popular term weighting schemes are primarily based on statistical analysis of term occurrences within the document collection. In this work we study how term weighting may benefit from syntactic analysis of the corpus. Focusing on Community-based Question Answering (CQA) sites, we take into account the syntactic function of the terms within CQA texts as an important factor affecting their relative importance for retrieval. We analyze a large log of web queries that landed on Yahoo Answers site, showing a strong deviation between the tendencies of different document words to appear in a landing (click-through) query given their syntactic function. To this end, we propose a novel term weighting method that makes use of the syntactic information available for each query term occurrence in the document, on top of term occurrence statistics. The relative importance of each feature is learned via a learning to rank algorithm that utilizes a click-through query log. We examine the new weighting scheme using manual evaluation based on editorial data and using automatic evaluation over the query log. Our experimental results show consistent improvement in retrieval when syntactic information is taken into account.

Original languageAmerican English
Title of host publicationCIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management
Pages351-360
Number of pages10
ISBN (Electronic)9781450325981
DOIs
StatePublished - 3 Nov 2014
Externally publishedYes
Event23rd ACM International Conference on Information and Knowledge Management, CIKM 2014 - Shanghai, China
Duration: 3 Nov 20147 Nov 2014

Publication series

NameCIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management

Conference

Conference23rd ACM International Conference on Information and Knowledge Management, CIKM 2014
Country/TerritoryChina
CityShanghai
Period3/11/147/11/14

Keywords

  • Community question answering
  • Dependency parsing
  • Learning to rank
  • Part-of-speech tagging
  • Term weighting

All Science Journal Classification (ASJC) codes

  • Information Systems and Management
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Improving term weighting for community question answering search using syntactic analysis'. Together they form a unique fingerprint.

Cite this