A relational framework for classifier engineering

Benny Kimelfeld, Christopher Ré

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In the design of analytical procedures and machine-learning solutions, a critical and time-consuming task is that of feature engineering, for which various recipes and tooling approaches have been developed. In this framework paper, we embark on the establishment of database foundations for feature engineering. We propose a formal framework for classification in the context of a relational database. The goal of this framework is to open the way to research and techniques to assist developers with the task of feature engineering by utilizing the database's modeling and understanding of data and queries, and by deploying the well studied principles of database management. As a first step, we demonstrate the usefulness of this framework by formally defining three key algorithmic challenges. The first challenge is that of separability, which is the problem of determining the existence of feature queries that agree with the training examples. The second is that of evaluating the VC dimension of the model class with respect to a given sequence of feature queries. The third challenge is identifiability, which is the task of testing for a property of independence among features that are represented as database queries. We give preliminary results on these challenges for the case where features are defined by means of conjunctive queries, and in particular we study the implication of various traditional syntactic restrictions on the inherent computational complexity.

Original languageEnglish
Title of host publicationPODS 2017 - Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
Pages5-20
Number of pages16
ISBN (Electronic)9781450341981
DOIs
StatePublished - 9 May 2017
Event36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017 - Chicago, United States
Duration: 14 May 201719 May 2017

Publication series

NameProceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
VolumePart F127745

Conference

Conference36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017
Country/TerritoryUnited States
CityChicago
Period14/05/1719/05/17

Keywords

  • Classifiers
  • Conjunctive queries
  • Feature engineering
  • Machine learning
  • Relational databases

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'A relational framework for classifier engineering'. Together they form a unique fingerprint.

Cite this