Non-Bayesian Parametric Missing-Mass Estimation

Shir Cohen, Tirza Routtenberg, Lang Tong

Research output: Contribution to journalArticlepeer-review

Abstract

We consider the classical problem of missing-mass estimation, which deals with estimating the total probability of unseen elements in a sample. The missing-mass estimation problem has various applications in machine learning, statistics, language processing, ecology, sensor networks, and others. The naive, constrained maximum likelihood (CML) estimator is inappropriate for this problem since it tends to overestimate the probability of the observed elements. Similarly, the constrained Cramér-Rao bound (CCRB), which is a lower bound on the mean-squared-error (MSE) of unbiased estimators of the entire probability mass function (pmf) vector, does not provide a relevant bound for missing-mass estimation. In this paper, we introduce a non-Bayesian parametric model of the problem of missing-mass estimation. We introduce the concept of missing-mass unbiasedness by using the Lehmann unbiasedness definition. We derive a non-Bayesian CCRB-type lower bound on the missing-mass MSE (mmMSE), named the missing-mass CCRB (mmCCRB), based on the missing-mass unbiasedness. The proposed mmCCRB can be used for system design and for the performance evaluation of existing estimators. Moreover, based on the mmCCRB, we propose a new method to improve estimators by an iterative missing-mass Fisher-scoring method. Finally, we demonstrate via numerical simulations that the biased mmCCRB is a valid and informative lower bound on the mmMSE of state-of-the-art estimators for this problem: the CML, asymptotic profile maximum likelihood (aPML), Good-Turing, and Laplace estimators. We also show that the mmMSE and missing-mass bias of the Laplace estimator is reduced by using the new missing-mass Fisher-scoring method.

Original languageAmerican English
Pages (from-to)3709-3725
Number of pages17
JournalIEEE Transactions on Signal Processing
Volume70
DOIs
StatePublished - 1 Jan 2022

Keywords

  • Good-Turing estimator
  • Lehmann unbiasedness
  • Non-Bayesian estimation
  • constrained Cramér-Rao bound
  • probability of missing mass

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Non-Bayesian Parametric Missing-Mass Estimation'. Together they form a unique fingerprint.

Cite this