Skip to main content

Mining sequential patterns by pattern-growth: The prefixspan approach

Publication ,  Journal Article
Pei, J; Han, J; Mortazavi-Asl, B; Wang, J; Pinto, H; Chen, Q; Dayal, U; Hsu, MC
Published in: IEEE Transactions on Knowledge and Data Engineering
November 1, 2004

Sequential pattern mining is an important data mining problem with broad applications. However, it Is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [1] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projection-based, sequential pattern-growth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growth-based sequential pattern mining, FreeSpan [8], we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases. To further improve the performance, a pseudoprojection technique is developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the a priori-based algorithm GSP, FreeSpan, and SPADE [29] (a sequential pattern mining algorithm that adopts vertical data format), and PrefixSpan integrated with pseudoprojection is the fastest among all the tested algorithms. Furthermore, this mining methodology can be extended to mining sequential patterns with user-specified constraints. The high promise of the pattern-growth approach may lead to its further extension toward efficient mining of other kinds of frequent patterns, such as frequent substructures.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

IEEE Transactions on Knowledge and Data Engineering

DOI

ISSN

1041-4347

Publication Date

November 1, 2004

Volume

16

Issue

11

Start / End Page

1424 / 1440

Related Subject Headings

  • Information Systems
  • 46 Information and computing sciences
  • 08 Information and Computing Sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., … Hsu, M. C. (2004). Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11), 1424–1440. https://doi.org/10.1109/TKDE.2004.77
Pei, J., J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. C. Hsu. “Mining sequential patterns by pattern-growth: The prefixspan approach.” IEEE Transactions on Knowledge and Data Engineering 16, no. 11 (November 1, 2004): 1424–40. https://doi.org/10.1109/TKDE.2004.77.
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, et al. Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering. 2004 Nov 1;16(11):1424–40.
Pei, J., et al. “Mining sequential patterns by pattern-growth: The prefixspan approach.” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 11, Nov. 2004, pp. 1424–40. Scopus, doi:10.1109/TKDE.2004.77.
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC. Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering. 2004 Nov 1;16(11):1424–1440.

Published In

IEEE Transactions on Knowledge and Data Engineering

DOI

ISSN

1041-4347

Publication Date

November 1, 2004

Volume

16

Issue

11

Start / End Page

1424 / 1440

Related Subject Headings

  • Information Systems
  • 46 Information and computing sciences
  • 08 Information and Computing Sciences