Skip to main content
Journal cover image

Mining discriminative items in multiple data streams

Publication ,  Journal Article
Lin, Z; Jiang, B; Pei, J; Jiang, D
Published in: World Wide Web
July 12, 2010

How can we maintain a dynamic profile capturing a user's reading interest against the common interest? What are the queries that have been asked 1,000 times more frequently to a search engine from users in Asia than in North America? What are the keywords (or tags) that are 1,000 times more frequent in the blog stream on computer games than in the blog stream on Hollywood movies? To answer such interesting questions, we need to find discriminative items in multiple data streams. Each data source, such as Web search queries in a region and blog postings on a topic, can be modeled as a data stream due to the fast growing volume of the source. Motivated by the extensive applications, in this paper, we study the problem of mining discriminative items in multiple data streams. We show that, to exactly find all discriminative items in stream S 1 against stream S 2 by one scan, the space lower bound is, where Σ is the alphabet of items and n 1 is the current size of S 1. To tackle the space challenge, we develop three heuristic algorithms that can achieve high precision and recall using sub-linear space and sub-linear processing time per item with respect to {pipe}Σ{pipe}. The complexity of all algorithms are independent to the size of the two streams. An extensive empirical study using both real data sets and synthetic data sets verifies our design. © 2010 Springer Science+Business Media, LLC.

Duke Scholars

Published In

World Wide Web

DOI

ISSN

1386-145X

Publication Date

July 12, 2010

Volume

13

Issue

4

Start / End Page

497 / 522

Related Subject Headings

  • Information Systems
  • 46 Information and computing sciences
  • 0806 Information Systems
  • 0805 Distributed Computing
  • 0804 Data Format
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Lin, Z., Jiang, B., Pei, J., & Jiang, D. (2010). Mining discriminative items in multiple data streams. World Wide Web, 13(4), 497–522. https://doi.org/10.1007/s11280-010-0094-0
Lin, Z., B. Jiang, J. Pei, and D. Jiang. “Mining discriminative items in multiple data streams.” World Wide Web 13, no. 4 (July 12, 2010): 497–522. https://doi.org/10.1007/s11280-010-0094-0.
Lin Z, Jiang B, Pei J, Jiang D. Mining discriminative items in multiple data streams. World Wide Web. 2010 Jul 12;13(4):497–522.
Lin, Z., et al. “Mining discriminative items in multiple data streams.” World Wide Web, vol. 13, no. 4, July 2010, pp. 497–522. Scopus, doi:10.1007/s11280-010-0094-0.
Lin Z, Jiang B, Pei J, Jiang D. Mining discriminative items in multiple data streams. World Wide Web. 2010 Jul 12;13(4):497–522.
Journal cover image

Published In

World Wide Web

DOI

ISSN

1386-145X

Publication Date

July 12, 2010

Volume

13

Issue

4

Start / End Page

497 / 522

Related Subject Headings

  • Information Systems
  • 46 Information and computing sciences
  • 0806 Information Systems
  • 0805 Distributed Computing
  • 0804 Data Format