TY - CHAP
T1 - Mean Tail
T2 - Top-K and Frequency Estimation with Fewer Counters and More Keys
AU - Biton, Dvir
AU - Friedman, Roy
N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - In the domain of flow frequency estimation, counter-based methods such as Frequent and Space Saving provide a superior ratio of space to approximation error guarantee than sketch-based approaches, especially when flow identifiers are not very large. Among counter-based methods, RAP appears to be the best. Traditional counter-based techniques are indifferent to the stream’s distribution. Yet, there are many practical settings in which the distribution is long tailed, i.e., the majority of elements belong to the long tail and share similar frequencies. In this work, we aim to provide even better space to approximation error ratios for such distributions. Specifically, we propose allocating a dedicated memory section to track the tail frequencies. However, instead of maintaining an individual counter for each element in the tail, we use a single aggregate counter to represent the entire tail, allowing us to estimate an average count for all tail elements. Obviously, this approach works best when the tail is close to uniform. By saving memory previously allocated to individual tail counters, we can double the number of keys tracked in the tail section, thereby improving the overall accuracy for these low-frequency elements. To that end, we present Mean Tail (MT), a novel counter-based data structure that supports updates and queries while specifically targeting the tail of the tracked keys, and offering better accuracy than RAP when sufficiently large. For top-K heavy hitters, MT’s recall is also better than RAP. All our code is open sourced [2].
AB - In the domain of flow frequency estimation, counter-based methods such as Frequent and Space Saving provide a superior ratio of space to approximation error guarantee than sketch-based approaches, especially when flow identifiers are not very large. Among counter-based methods, RAP appears to be the best. Traditional counter-based techniques are indifferent to the stream’s distribution. Yet, there are many practical settings in which the distribution is long tailed, i.e., the majority of elements belong to the long tail and share similar frequencies. In this work, we aim to provide even better space to approximation error ratios for such distributions. Specifically, we propose allocating a dedicated memory section to track the tail frequencies. However, instead of maintaining an individual counter for each element in the tail, we use a single aggregate counter to represent the entire tail, allowing us to estimate an average count for all tail elements. Obviously, this approach works best when the tail is close to uniform. By saving memory previously allocated to individual tail counters, we can double the number of keys tracked in the tail section, thereby improving the overall accuracy for these low-frequency elements. To that end, we present Mean Tail (MT), a novel counter-based data structure that supports updates and queries while specifically targeting the tail of the tracked keys, and offering better accuracy than RAP when sufficiently large. For top-K heavy hitters, MT’s recall is also better than RAP. All our code is open sourced [2].
UR - http://www.scopus.com/inward/record.url?scp=105003416357&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-87766-7_20
DO - 10.1007/978-3-031-87766-7_20
M3 - فصل
T3 - Lecture Notes on Data Engineering and Communications Technologies
SP - 222
EP - 233
BT - Lecture Notes on Data Engineering and Communications Technologies
PB - Springer Science and Business Media Deutschland GmbH
ER -