Minimization of Classifier Construction Cost for Search Queries

Shay Gershtein, Tova Milo, Gefen Morami, Slava Novgorodov

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Search over massive sets of items is the cornerstone of many modern applications. Users express a set of properties and expect the system to retrieve qualifying items. A common difficulty, however, is that the information on whether an item satisfies the search criteria is not explicitly recorded in the repository. Instead, it may be general knowledge or "hidden" in a picture/description, leading to incomplete search results. To overcome this problem, companies build dedicated classifiers that determine which items satisfy the given criteria. However, building classifiers requires volumes of high-quality labeled training data. Since the costs of training classifiers for different subsets of properties can vastly differ, the choice of which classifiers to train has great monetary significance. The goal of our research is to devise effective algorithms to choose which classifiers one should train to address a given query load while minimizing the cost. Previous work considered a simplified model with uniform classifier costs, and queries with two properties. We remove these restrictions in our model. We prove NP-hard inapproximability bounds and devise several algorithms with approximation guarantees. Moreover, we identify a common special case for which we provide an exact algorithm. Our experiments, performed over real-life datasets, demonstrate the effectiveness and efficiency of our algorithms.

Original languageEnglish
Title of host publicationSIGMOD 2020 - Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
Pages1351-1365
Number of pages15
ISBN (Electronic)9781450367356
DOIs
StatePublished - 14 Jun 2020
Event2020 ACM SIGMOD International Conference on Management of Data, SIGMOD 2020 - Portland, United States
Duration: 14 Jun 202019 Jun 2020

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data

Conference

Conference2020 ACM SIGMOD International Conference on Management of Data, SIGMOD 2020
Country/TerritoryUnited States
CityPortland
Period14/06/2019/06/20

Keywords

  • classifiers
  • classifiers construction cost
  • e-commerce
  • search queries

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Minimization of Classifier Construction Cost for Search Queries'. Together they form a unique fingerprint.

Cite this