Providing concise database covers instantly by recursive tile sampling

Sandy Moens, Mario Boley, Bart Goethals

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Known pattern discovery algorithms for finding tilings (covers of 0/1-databases consisting of 1-rectangles) cannot be integrated in instant and interactive KD tools, because they do not satisfy at least one of two key requirements: a) to provide results within a short response time of only a few seconds and b) to return a concise set of patterns with only a few elements that nevertheless covers a large fraction of the input database. In this paper we present a novel randomized algorithm that works well under these requirements. It is based on the recursive application of a simple tile sample procedure that can be implemented efficiently using rejection sampling. While, as we analyse, the theoretical solution distribution can be weak in the worst case, the approach performs very well in practice and outperforms previous sampling as well as deterministic algorithms.

Original languageAmerican English
Title of host publicationDiscovery Science - 17th International Conference, DS 2014, Proceedings
EditorsSašo Džeroski, Panče Panov, Dragi Kocev, Ljupčo Todorovski
PublisherSpringer Verlag
Pages216-227
Number of pages12
ISBN (Electronic)9783319118116
DOIs
StatePublished - 2014
Externally publishedYes
Event17th International Conference on Discovery Science, DS 2014 - Bled, Slovenia
Duration: 8 Oct 201410 Oct 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8777

Conference

Conference17th International Conference on Discovery Science, DS 2014
Country/TerritorySlovenia
CityBled
Period8/10/1410/10/14

Keywords

  • Instant Pattern Mining
  • Sampling Closed Itemsets
  • Tiling Databases

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Cite this