TY - GEN
T1 - Improving term weighting for community question answering search using syntactic analysis
AU - Carmel, David
AU - Mejer, Avihai
AU - Pinter, Yuval
AU - Szpektor, Idan
N1 - Publisher Copyright: Copyright 2014 ACM.
PY - 2014/11/3
Y1 - 2014/11/3
N2 - Query term weighting is a fundamental task in information retrieval and most popular term weighting schemes are primarily based on statistical analysis of term occurrences within the document collection. In this work we study how term weighting may benefit from syntactic analysis of the corpus. Focusing on Community-based Question Answering (CQA) sites, we take into account the syntactic function of the terms within CQA texts as an important factor affecting their relative importance for retrieval. We analyze a large log of web queries that landed on Yahoo Answers site, showing a strong deviation between the tendencies of different document words to appear in a landing (click-through) query given their syntactic function. To this end, we propose a novel term weighting method that makes use of the syntactic information available for each query term occurrence in the document, on top of term occurrence statistics. The relative importance of each feature is learned via a learning to rank algorithm that utilizes a click-through query log. We examine the new weighting scheme using manual evaluation based on editorial data and using automatic evaluation over the query log. Our experimental results show consistent improvement in retrieval when syntactic information is taken into account.
AB - Query term weighting is a fundamental task in information retrieval and most popular term weighting schemes are primarily based on statistical analysis of term occurrences within the document collection. In this work we study how term weighting may benefit from syntactic analysis of the corpus. Focusing on Community-based Question Answering (CQA) sites, we take into account the syntactic function of the terms within CQA texts as an important factor affecting their relative importance for retrieval. We analyze a large log of web queries that landed on Yahoo Answers site, showing a strong deviation between the tendencies of different document words to appear in a landing (click-through) query given their syntactic function. To this end, we propose a novel term weighting method that makes use of the syntactic information available for each query term occurrence in the document, on top of term occurrence statistics. The relative importance of each feature is learned via a learning to rank algorithm that utilizes a click-through query log. We examine the new weighting scheme using manual evaluation based on editorial data and using automatic evaluation over the query log. Our experimental results show consistent improvement in retrieval when syntactic information is taken into account.
KW - Community question answering
KW - Dependency parsing
KW - Learning to rank
KW - Part-of-speech tagging
KW - Term weighting
UR - http://www.scopus.com/inward/record.url?scp=84937598800&partnerID=8YFLogxK
U2 - https://doi.org/10.1145/2661829.2661901
DO - https://doi.org/10.1145/2661829.2661901
M3 - Conference contribution
T3 - CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management
SP - 351
EP - 360
BT - CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management
T2 - 23rd ACM International Conference on Information and Knowledge Management, CIKM 2014
Y2 - 3 November 2014 through 7 November 2014
ER -