How Much Event Data Is Enough? A Statistical Framework for Process Discovery

Martin Bauer, Arik Senderovich, Avigdor Gal, Lars Grunske, Matthias Weidlich

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

With the increasing availability of business process related event logs, the scalability of techniques that discover a process model from such logs becomes a performance bottleneck. In particular, exploratory analysis that investigates manifold parameter settings of discovery algorithms, potentially using a software-as-a-service tool, relies on fast response times. However, common approaches for process model discovery always parse and analyse all available event data, whereas a small fraction of a log could have already led to a high-quality model. In this paper, we therefore present a framework for process discovery that relies on statistical pre-processing of an event log and significantly reduce its size by means of sampling. It thereby reduces the runtime and memory footprint of process discovery algorithms, while providing guarantees on the introduced sampling error. Experiments with two public real-world event logs reveal that our approach speeds up state-of-the-art discovery algorithms by a factor of up to 20.

Original languageEnglish
Title of host publicationADVANCED INFORMATION SYSTEMS ENGINEERING, CAISE 2018
EditorsJohn Krogstie, Hajo A. Reijers
Pages239-256
Number of pages18
Volume10816
DOIs
StatePublished - 2018
Event30th International Conference on Advanced Information Systems Engineering, CAiSE 2018 - Tallinn, Estonia
Duration: 11 Jun 201815 Jun 2018

Publication series

NameLecture Notes in Computer Science

Conference

Conference30th International Conference on Advanced Information Systems Engineering, CAiSE 2018
Country/TerritoryEstonia
CityTallinn
Period11/06/1815/06/18

Keywords

  • Log pre-processing
  • Log sampling
  • Process discovery

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'How Much Event Data Is Enough? A Statistical Framework for Process Discovery'. Together they form a unique fingerprint.

Cite this