TESTING FOR OUTLIERS WITH CONFORMAL P-VALUES

Stephen Bates, Emmanuel Candès, Lihua Lei, Yaniv Romano, Matteo Sesia

Research output: Contribution to journalArticlepeer-review

Abstract

This paper studies the construction of p-values for nonparametric outlier detection, from a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a general framework yielding p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Our results depart from classical conformal inference as we leverage concentration inequalities rather than combinatorial arguments to establish our finite-sample guarantees. Further, our techniques also yield a uniform confidence bound for the false positive rate of any outlier detection algorithm, as a function of the threshold applied to its raw statistics. Finally, the relevance of our results is demonstrated by experiments on real and simulated data.

Original languageEnglish
Pages (from-to)149-178
Number of pages30
JournalAnnals of Statistics
Volume51
Issue number1
DOIs
StatePublished - Feb 2023

Keywords

  • Conformal inference
  • false discovery rate
  • out-of-distribution
  • positive dependence

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'TESTING FOR OUTLIERS WITH CONFORMAL P-VALUES'. Together they form a unique fingerprint.

Cite this