Skip to main navigation Skip to search Skip to main content

Mean Tail: Top-K and Frequency Estimation with Fewer Counters and More Keys

Dvir Biton, Roy Friedman

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

In the domain of flow frequency estimation, counter-based methods such as Frequent and Space Saving provide a superior ratio of space to approximation error guarantee than sketch-based approaches, especially when flow identifiers are not very large. Among counter-based methods, RAP appears to be the best. Traditional counter-based techniques are indifferent to the stream’s distribution. Yet, there are many practical settings in which the distribution is long tailed, i.e., the majority of elements belong to the long tail and share similar frequencies. In this work, we aim to provide even better space to approximation error ratios for such distributions. Specifically, we propose allocating a dedicated memory section to track the tail frequencies. However, instead of maintaining an individual counter for each element in the tail, we use a single aggregate counter to represent the entire tail, allowing us to estimate an average count for all tail elements. Obviously, this approach works best when the tail is close to uniform. By saving memory previously allocated to individual tail counters, we can double the number of keys tracked in the tail section, thereby improving the overall accuracy for these low-frequency elements. To that end, we present Mean Tail (MT), a novel counter-based data structure that supports updates and queries while specifically targeting the tail of the tracked keys, and offering better accuracy than RAP when sufficiently large. For top-K heavy hitters, MT’s recall is also better than RAP. All our code is open sourced [2].

Original languageEnglish
Title of host publicationLecture Notes on Data Engineering and Communications Technologies
PublisherSpringer Science and Business Media Deutschland GmbH
Pages222-233
Number of pages12
DOIs
StatePublished - 2025

Publication series

NameLecture Notes on Data Engineering and Communications Technologies
Volume246

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Media Technology
  • Computer Science Applications
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Mean Tail: Top-K and Frequency Estimation with Fewer Counters and More Keys'. Together they form a unique fingerprint.

Cite this