Deriving probabilistic databases with inference ensembles

Julia Stoyanovich, Susan Davidson, Tova Milo, Val Tannen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Many real-world applications deal with uncertain or missing data, prompting a surge of activity in the area of probabilistic databases. A shortcoming of prior work is the assumption that an appropriate probabilistic model, along with the necessary probability distributions, is given. We address this shortcoming by presenting a framework for learning a set of inference ensembles, termed meta-rule semi-lattices, or MRSL, from the complete portion of the data. We use the MRSL to infer probability distributions for missing data, and demonstrate experimentally that high accuracy is achieved when a single attribute value is missing per tuple. We next propose an inference algorithm based on Gibbs sampling that accurately predicts the probability distribution for multiple missing values. We also develop an optimization that greatly improves performance of multi-attribute inference for collections of tuples, while maintaining high accuracy. Finally, we develop an experimental framework to evaluate the efficiency and accuracy of our approach.

Original languageEnglish
Title of host publication2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
Pages303-314
Number of pages12
DOIs
StatePublished - 2011
Event2011 IEEE 27th International Conference on Data Engineering, ICDE 2011 - Hannover, Germany
Duration: 11 Apr 201116 Apr 2011

Publication series

NameProceedings - International Conference on Data Engineering

Conference

Conference2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
Country/TerritoryGermany
CityHannover
Period11/04/1116/04/11

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'Deriving probabilistic databases with inference ensembles'. Together they form a unique fingerprint.

Cite this