The complexity of mining maximal frequent subgraphs

Benny Kimelfeld, Phokion G. Kolaitis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

A frequent subgraph of a given collection of graphs is a graph that is isomorphic to a subgraph of at least as many graphs in the collection as a given threshold. Frequent subgraphs generalize frequent itemsets and arise in various contexts, from bioinformatics to the Web. Since the space of frequent subgraphs is typically extremely large, research in graph mining has focused on special types of frequent subgraphs that can be orders of magnitude smaller in number, yet encapsulate the space of all frequent subgraphs. Maximal frequent subgraphs (i.e., the ones not properly contained in any frequent subgraph) constitute the most useful such type. In this paper, we embark on a comprehensive investigation of the computational complexity of mining maximal frequent subgraphs. Our study is carried out by considering the effect of three different parameters: possible restrictions on the class of graphs; a fixed bound on the threshold; and a fixed bound on the number of desired answers. We focus on specific classes of connected graphs: general graphs, planar graphs, graphs of bounded degree, and graphs of bounded tree-width (trees being a special case). Moreover, each class has two variants: the one in which the nodes are unlabeled, and the one in which they are uniquely labeled. We delineate the complexity of the enumeration problem for each of these variants by determining when it is solvable in (total or incremental) polynomial time and when it is NP-hard. Specifically, for the labeled classes, we show that bounding the threshold yields tractability but, in most cases, bounding the number of answers does not, unless P=NP; an exception is the case of labeled trees, where bounding either of these two parameters yields tractability. The state of affairs turns out to be quite different for the unlabeled classes. The main (and most challenging to prove) result concerns unlabeled trees: we show NP-hardness, even if the input consists of two trees, and both the threshold and the number of desired answers are equal to just two. In other words, we establish that the following problem is NP-complete: given two unla-beled trees, do they have more than one maximal subtree in common?

Original languageEnglish
Title of host publicationPODS 2013 - Proceedings of the 32nd Symposium on Principles of Database Systems
Pages13-24
Number of pages12
DOIs
StatePublished - 2013
Externally publishedYes
Event32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013 - New York, NY, United States
Duration: 22 Jun 201327 Jun 2013

Publication series

NameProceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

Conference

Conference32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013
Country/TerritoryUnited States
CityNew York, NY
Period22/06/1327/06/13

Keywords

  • Enumeration complexity
  • Graph mining
  • Maximal frequent subgraphs

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'The complexity of mining maximal frequent subgraphs'. Together they form a unique fingerprint.

Cite this