Abstract
Identifying frequent item-sets is a popular data-mining task. It consists of finding sets of items frequently appearing in data. Yet, finding all frequent item-sets in large or dense datasets may be time-consuming, and a user may be interested merely in some specific item-sets rather than all of them. Recently, methods have been proposed for targeted item-set mining; that is to calculate the support of some item-sets of interest. Though this approach is often more suitable for real applications than traditional item-set mining approaches, performance remains an issue. To address that issue, this paper presents a novel algorithm for multitude-targeted mining, named Guided Frequent Pattern-Growth (GFP-Growth). The GFP-Growth algorithm is designed to quickly mine a given set of item-sets using a small amount of memory. This paper proves that GFP-Growth yields the exact frequency-counts for each item-set of interest. It further shows that GFP-Growth can boost the performance for several problems requiring item-set mining. We specifically study the problem of generating minority-class rules from imbalanced data and develop the Minority-Report Algorithm (MRA) that uses GFP-Growth to solve this problem efficiently. We prove several theoretical properties of MRA and present experimental results showing substantial performance gain.
Original language | American English |
---|---|
Pages (from-to) | 353-375 |
Number of pages | 23 |
Journal | Information Sciences |
Volume | 553 |
DOIs | |
State | Published - Apr 2021 |
Keywords
- Data mining
- Guided FP-Growth
- Imbalanced data
- Item-set discovery
- Minority-class rule
- Multi-targeted mining
All Science Journal Classification (ASJC) codes
- Software
- Control and Systems Engineering
- Theoretical Computer Science
- Computer Science Applications
- Information Systems and Management
- Artificial Intelligence