CFDB: Machine Learning Model Analysis via Databases of CounterFactuals

Idan Meyuhas, Aviv Ben Arie, Yair Horesh, Daniel Deutch

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Data Scientists often design and train complex Machine Learning models that evolve over time due to re-training on new data, a revised architecture, or both. To assist Data Scientists in this process, many methods for analyzing models have been recently developed. A prominent approach for model analysis is based on the notion of Counterfactuals. A counterfactual (CF) intuitively explains the label assigned by the model to a particular instance by identifying perturbations to the instance that lead to a different predicted label. A large body of recent literature has demonstrated the usefulness of CFs for deriving insights on the model at large. The analyzed CFs come in various flavors and are applied to instances chosen based on various criteria, in the context of different analysis goals. In this work we propose to demonstrate CFDB (Counterfactuals Database), a unified framework for querying Counterfactuals. CFDB allows to consolidate common approaches in CF-based analysis and to provide multiple levels of abstractions in a relational framework. We will demonstrate CFDB in the context of the Lending Club Loan Data, showing its usefulness by formulating and executing multiple analyses over evolving classifiers for Loan Approval.

Original languageEnglish
Title of host publicationSIGMOD 2022 - Proceedings of the 2022 International Conference on Management of Data
Pages2401-2404
Number of pages4
ISBN (Electronic)9781450392495
DOIs
StatePublished - 10 Jun 2022
Event2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022 - Virtual, Online, United States
Duration: 12 Jun 202217 Jun 2022

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data

Conference

Conference2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022
Country/TerritoryUnited States
CityVirtual, Online
Period12/06/2217/06/22

Keywords

  • counterfactuals
  • databases

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'CFDB: Machine Learning Model Analysis via Databases of CounterFactuals'. Together they form a unique fingerprint.

Cite this