TY - GEN
T1 - Querying Incomplete Numerical Data
T2 - 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2023
AU - Console, Marco
AU - Libkin, Leonid
AU - Peterfreund, Liat
N1 - Publisher Copyright: © 2023 ACM.
PY - 2023/6/18
Y1 - 2023/6/18
N2 - Queries with aggregation and arithmetic operations, as well as incomplete data, are common in real-world database, but we lack a good understanding of how they should interact. On the one hand, systems based on SQL provide ad-hoc rules for numerical nulls, on the other, theoretical research largely concentrates on the standard notions of certain and possible answers. In the presence of numerical attributes and aggregates, however, these answers are often meaningless, returning either too little or too much. Our goal is to define a principled framework for databases with numerical nulls and answering queries with arithmetic and aggregations over them. Towards this goal, we assume that missing values in numerical attributes are given by probability distributions associated with marked nulls. This yields a model of probabilistic bag databases in which tuples are not necessarily independent since nulls can repeat. We provide a general compositional framework for query answering and then concentrate on queries that resemble standard SQL with arithmetic and aggregation. We show that these queries are measurable, and their outputs have a finite representation. Moreover, since the classical forms of answers provide little information in the numerical setting, we look at the probability that numerical values in output tuples belong to specific intervals. Even though their exact computation is intractable, we show efficient approximation algorithms to compute such probabilities.
AB - Queries with aggregation and arithmetic operations, as well as incomplete data, are common in real-world database, but we lack a good understanding of how they should interact. On the one hand, systems based on SQL provide ad-hoc rules for numerical nulls, on the other, theoretical research largely concentrates on the standard notions of certain and possible answers. In the presence of numerical attributes and aggregates, however, these answers are often meaningless, returning either too little or too much. Our goal is to define a principled framework for databases with numerical nulls and answering queries with arithmetic and aggregations over them. Towards this goal, we assume that missing values in numerical attributes are given by probability distributions associated with marked nulls. This yields a model of probabilistic bag databases in which tuples are not necessarily independent since nulls can repeat. We provide a general compositional framework for query answering and then concentrate on queries that resemble standard SQL with arithmetic and aggregation. We show that these queries are measurable, and their outputs have a finite representation. Moreover, since the classical forms of answers provide little information in the numerical setting, we look at the probability that numerical values in output tuples belong to specific intervals. Even though their exact computation is intractable, we show efficient approximation algorithms to compute such probabilities.
KW - aggregate queries
KW - approximations
KW - certain and possible answers
KW - nulls
KW - numerical attributes
KW - probabilistic databases
UR - http://www.scopus.com/inward/record.url?scp=85164270320&partnerID=8YFLogxK
U2 - 10.1145/3584372.3588660
DO - 10.1145/3584372.3588660
M3 - منشور من مؤتمر
T3 - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
SP - 349
EP - 358
BT - PODS 2023 - Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
Y2 - 18 June 2023 through 23 June 2023
ER -