H-Mine: Fast and space-preserving frequent pattern mining in a large databases
In this study, we propose a simple and novel data structure using hyper-links, H-struct, and a new mining algorithm, H-mine, which takes advantage of this data structure and dynamically adjusts links in the mining process. A distinct feature of this method is that it has a very limited and precisely predictable main memory cost and runs very quickly in memory-based settings. Moreover, it can be scaled up to very large databases using database partitioning. When the data set becomes dense, (conditional) FP-trees can be constructed dynamically as part of the mining process. Our study shows that H-mine has an excellent performance for various kinds of data, outperforms currently available algorithms in different settings, and is highly scalable to mining large databases. This study also proposes a new data mining methodology, space-preserving mining, which may have a major impact on the future development of efficient and scalable data mining methods.
Duke Scholars
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Operations Research
- 49 Mathematical sciences
- 40 Engineering
- 35 Commerce, management, tourism and services
- 15 Commerce, Management, Tourism and Services
- 09 Engineering
- 01 Mathematical Sciences
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Operations Research
- 49 Mathematical sciences
- 40 Engineering
- 35 Commerce, management, tourism and services
- 15 Commerce, Management, Tourism and Services
- 09 Engineering
- 01 Mathematical Sciences