Algorithms and estimators for summarization of unaggregated data streams

Edith Cohen, Nick Duffield, Haim Kaplan, Carstent Lund, Mikkel Thorup

Research output: Contribution to journalArticlepeer-review


Statistical summaries of IP traffic are at the heart of network operation and are used to recover aggregate information on subpopulations of flows. It is therefore of great importance to collect the most accurate and informative summaries given the router's resource constraints. A summarization algorithm, such as Cisco's sampled NetFlow, is applied to IP packet streams that consist of multiple interleaving IP flows. We develop sampling algorithms and unbiased estimators which address sources of inefficiency in current methods. First, we design tunable algorithms whereas currently a single parameter (the sampling rate) controls utilization of both memory and processing/access speed (which means that it has to be set according to the bottleneck resource). Second, we make a better use of the memory hierarchy, which involves exporting partial summaries to slower storage during the measurement period.

Original languageEnglish
Pages (from-to)1214-1244
Number of pages31
JournalJournal of Computer and System Sciences
Issue number7
StatePublished - Nov 2014


  • Data streams
  • Flow size distribution
  • IP flows
  • NetFlow
  • Random sampling
  • Subpopulation queries

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science
  • Applied Mathematics
  • Computer Networks and Communications
  • Computational Theory and Mathematics


Dive into the research topics of 'Algorithms and estimators for summarization of unaggregated data streams'. Together they form a unique fingerprint.

Cite this