Abstract
Generalizing from detailed data to statements in a broader context is often critical for users to make sense of large data sets. Correspondingly, poorly constructed generalizations might convey misleading information even if the statements are technically supported by the data. For example, a cherry-picked level of aggregation could obscure substantial sub-groups that oppose the generalization. We present a framework for detecting and explaining cherry-picked generalizations by refining aggregate queries. We present a scoring method to indicate the appropriateness of the generalizations. We design efficient algorithms for score computation. For providing a better understanding of the resulting score, we also formulate practical explanation tasks to disclose significant counterexamples and provide better alternatives to the statement. We conduct experiments using real-world data sets and examples to show the effectiveness of our proposed evaluation metric and the efficiency of our algorithmic framework.
Original language | English |
---|---|
Pages (from-to) | 59-71 |
Number of pages | 13 |
Journal | Proceedings of the VLDB Endowment |
Volume | 15 |
Issue number | 1 |
DOIs | |
State | Published - 1 Jan 2021 |
Event | 48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, Australia Duration: 5 Sep 2022 → 9 Sep 2022 |
All Science Journal Classification (ASJC) codes
- Computer Science (miscellaneous)
- General Computer Science