Abstract
In this article we develop a method based on model-X knockoffs to find conditional associations that are consistent across environments, while controlling the false discovery rate. The motivation for this problem is that large datasets may contain numerous associations that are statistically significant and yet misleading, as they are induced by confounders or sampling imperfections. However, associations replicated under different conditions may be more interesting. In fact, sometimes consistency provably leads to valid causal inferences even if conditional associations do not. Although the proposed method is widely applicable, in this paper we highlight its relevance to genome-wide association studies, in which robustness across populations with diverse ancestries mitigates confounding due to unmeasured variants. The effectiveness of this approach is demonstrated by simulations and applications to UK Biobank data.
Original language | Undefined/Unknown |
---|---|
Pages (from-to) | 611-629 |
Number of pages | 19 |
Journal | Biometrika |
Volume | 109 |
Issue number | 3 |
DOIs | |
State | Published - 1 Sep 2022 |
Keywords
- Causality
- Conditional independence
- False discovery rate
- Genome-wide association study
All Science Journal Classification (ASJC) codes
- Applied Mathematics
- Agricultural and Biological Sciences (miscellaneous)
- General Agricultural and Biological Sciences
- Statistics and Probability
- Statistics, Probability and Uncertainty
- General Mathematics