Regularizing conjunctive features for classification

Pablo Barceló, Alexander Baumgartner, Victor Dalmau, Benny Kimelfeld

Research output: Contribution to journalArticlepeer-review

Abstract

We consider the feature-generation task wherein we are given a database with entities labeled as positive and negative examples, and we want to find feature queries that linearly separate the two sets of examples. We focus on conjunctive feature queries, and explore two problems: (a) deciding if separating feature queries exist (separability), and (b) generating such queries when they exist. To restrict the complexity of the generated classifiers, we explore various ways of regularizing them by limiting their dimension, the number of joins in feature queries, and their generalized hypertreewidth (ghw). We show that the separability problem is tractable for bounded ghw; yet, the generation problem is not because feature queries might be too large. So, we explore a third problem: classifying new entities without necessarily generating the feature queries. Interestingly, in the case of bounded ghw we can efficiently classify without explicitly generating such queries.

Original languageEnglish
Pages (from-to)97-124
Number of pages28
JournalJournal of Computer and System Sciences
Volume119
DOIs
StatePublished - Aug 2021

Keywords

  • Classification
  • Conjunctive queries
  • Feature generation
  • Generalized hypertree width
  • Separability

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Networks and Communications
  • Computational Theory and Mathematics
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Regularizing conjunctive features for classification'. Together they form a unique fingerprint.

Cite this