A Data-driven Missing Mass Estimation Framework

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Consider a finite sample from an unknown distribution over a countable alphabet. The missing mass refers to the probability of symbols that do not appear in the sample. Missing mass estimation is a fundamental problem in statistics, information theory and related fields, which dates back to the early work of Laplace, and the more recent seminal contribution of Good and Turing. Most popular missing mass estimation schemes are universal, in the sense that they preform well for every possible distribution. Interestingly, the worst-case distribution, for which these schemes perform the worst, is known to be uniform. On the other hand, real-world distributions are typically heavy-tailed. This means that current frameworks may be over-pessimistic, in many cases of interest. In this work we suggest a data-dependent estimation scheme to address this caveat. Specifically, we infer a subset of distributions from the sample, and control the worst-case performance only over that subset. Our suggested scheme demonstrates improved performance guarantees compared to alternative methods.

Original languageEnglish
Title of host publication2022 IEEE International Symposium on Information Theory, ISIT 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2991-2995
Number of pages5
ISBN (Electronic)9781665421591
DOIs
StatePublished - 2022
Event2022 IEEE International Symposium on Information Theory, ISIT 2022 - Espoo, Finland
Duration: 26 Jun 20221 Jul 2022

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
Volume2022-June

Conference

Conference2022 IEEE International Symposium on Information Theory, ISIT 2022
Country/TerritoryFinland
CityEspoo
Period26/06/221/07/22

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Information Systems
  • Modelling and Simulation
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'A Data-driven Missing Mass Estimation Framework'. Together they form a unique fingerprint.

Cite this