TY - GEN
T1 - Regularizing conjunctive features for classification
AU - Barceló, Pablo
AU - Baumgartner, Alexander
AU - Dalmau, Victor
AU - Kimelfeld, Benny
N1 - Publisher Copyright: © 2019 ACM.
PY - 2019/6/13
Y1 - 2019/6/13
N2 - We consider the feature-generation task wherein we are given a database with entities labeled as positive and negative examples, and the goal is to find feature queries that allow for a linear separation between the two sets of examples. We focus on conjunctive feature queries, and explore two fundamental problems: (a) deciding whether separating feature queries exist (separability), and (b) generating such queries when they exist. In the approximate versions of these problems, we allow a predefined fraction of the examples to be misclassified. To restrict the complexity of the generated classifiers, we explore various ways of regularizing (i.e., imposing simplicity constraints on) them by limiting their dimension, the number of joins in feature queries, and their generalized hypertree width (ghw). Among other results, we show that the separability problem is tractable in the case of bounded ghw; yet, the generation problem is intractable, simply because the feature queries might be too large. So, we explore a third problem: classifying new entities without necessarily generating the feature queries. Interestingly, in the case of bounded ghw we can efficiently classify without ever explicitly generating the feature queries.
AB - We consider the feature-generation task wherein we are given a database with entities labeled as positive and negative examples, and the goal is to find feature queries that allow for a linear separation between the two sets of examples. We focus on conjunctive feature queries, and explore two fundamental problems: (a) deciding whether separating feature queries exist (separability), and (b) generating such queries when they exist. In the approximate versions of these problems, we allow a predefined fraction of the examples to be misclassified. To restrict the complexity of the generated classifiers, we explore various ways of regularizing (i.e., imposing simplicity constraints on) them by limiting their dimension, the number of joins in feature queries, and their generalized hypertree width (ghw). Among other results, we show that the separability problem is tractable in the case of bounded ghw; yet, the generation problem is intractable, simply because the feature queries might be too large. So, we explore a third problem: classifying new entities without necessarily generating the feature queries. Interestingly, in the case of bounded ghw we can efficiently classify without ever explicitly generating the feature queries.
KW - Classification
KW - Conjunctive queries
KW - Feature generation
KW - Generalized hypertree width
KW - Separability
UR - http://www.scopus.com/inward/record.url?scp=85067191422&partnerID=8YFLogxK
U2 - 10.1145/3294052.3319680
DO - 10.1145/3294052.3319680
M3 - منشور من مؤتمر
T3 - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
SP - 2
EP - 16
BT - PODS 2019 - Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
T2 - 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. PODS 2019, held in conjunction with the 2019 ACM SIGMOD International Conference on Management of Data, SIGMOD 2019
Y2 - 1 July 2019 through 3 July 2019
ER -