Skip to main content
Journal cover image

Mining top-k sequential patterns in transaction database graphs: A new challenging problem and a sampling-based approach

Publication ,  Journal Article
Lei, M; Chu, L; Wang, Z; Pei, J; He, C; Zhang, X; Fang, B
Published in: World Wide Web
January 1, 2020

In many real world networks, a vertex is usually associated with a transaction database that comprehensively describes the behaviour of the vertex. A typical example is a social network, where the behaviours of every user are depicted by a transaction database that stores her daily posted contents. Specifically, a transaction database consists of a collection of transactions, where each transaction corresponds to a piece of tweet. For each transaction, it consists of a set of items, where each item may correspond to a keyword or a piece of video clip contained in this tweet. To model such type of scenario, we propose the novel notion of the transaction database graph, where each vertex is associated with a transaction database. Every path of the graph is a sequence of vertices that induces multiple sequences of transactions. The sequences of transactions induced by all of the paths in the graph form an extremely large sequence database. Finding frequent sequential patterns from such sequence database discovers interesting subsequences that frequently appear in many paths of the network. Our goal is to find the top-k frequent sequential patterns in the sequence database induced from a transaction database graph. However, it is challenging since the sequence database induced by a transaction database graph is too large to be explicitly induced and stored, and finding the top-k frequent sequential patterns is #P-hard. To tackle this problem, we propose an efficient two-step sampling algorithm that approximates the top-k frequent sequential patterns with the provable quality guarantee. Extensive experimental results on synthetic and real-world data sets demonstrate the effectiveness and efficiency of our method.

Duke Scholars

Published In

World Wide Web

DOI

EISSN

1573-1413

ISSN

1386-145X

Publication Date

January 1, 2020

Volume

23

Issue

1

Start / End Page

103 / 130

Related Subject Headings

  • Information Systems
  • 46 Information and computing sciences
  • 0806 Information Systems
  • 0805 Distributed Computing
  • 0804 Data Format
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Lei, M., Chu, L., Wang, Z., Pei, J., He, C., Zhang, X., & Fang, B. (2020). Mining top-k sequential patterns in transaction database graphs: A new challenging problem and a sampling-based approach. World Wide Web, 23(1), 103–130. https://doi.org/10.1007/s11280-019-00686-w
Lei, M., L. Chu, Z. Wang, J. Pei, C. He, X. Zhang, and B. Fang. “Mining top-k sequential patterns in transaction database graphs: A new challenging problem and a sampling-based approach.” World Wide Web 23, no. 1 (January 1, 2020): 103–30. https://doi.org/10.1007/s11280-019-00686-w.
Lei M, Chu L, Wang Z, Pei J, He C, Zhang X, et al. Mining top-k sequential patterns in transaction database graphs: A new challenging problem and a sampling-based approach. World Wide Web. 2020 Jan 1;23(1):103–30.
Lei, M., et al. “Mining top-k sequential patterns in transaction database graphs: A new challenging problem and a sampling-based approach.” World Wide Web, vol. 23, no. 1, Jan. 2020, pp. 103–30. Scopus, doi:10.1007/s11280-019-00686-w.
Lei M, Chu L, Wang Z, Pei J, He C, Zhang X, Fang B. Mining top-k sequential patterns in transaction database graphs: A new challenging problem and a sampling-based approach. World Wide Web. 2020 Jan 1;23(1):103–130.
Journal cover image

Published In

World Wide Web

DOI

EISSN

1573-1413

ISSN

1386-145X

Publication Date

January 1, 2020

Volume

23

Issue

1

Start / End Page

103 / 130

Related Subject Headings

  • Information Systems
  • 46 Information and computing sciences
  • 0806 Information Systems
  • 0805 Distributed Computing
  • 0804 Data Format