Skip to main content

Mining query subtopics from search log data

Publication ,  Conference
Hu, Y; Qian, Y; Li, H; Jiang, D; Pei, J; Zheng, Q
Published in: SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval
September 28, 2012

Most queries in web search are ambiguous and multifaceted. Identifying the major senses and facets of queries from search log data, referred to as query subtopic mining in this paper, is a very important issue in web search. Through search log analysis, we show that there are two interesting phenomena of user behavior that can be leveraged to identify query subtopics, referred to as 'one subtopic per search' and 'subtopic clarification by keyword'. One subtopic per search means that if a user clicks multiple URLs in one query, then the clicked URLs tend to represent the same sense or facet. Subtopic clarification by keyword means that users often add an additional keyword or keywords to expand the query in order to clarify their search intent. Thus, the keywords tend to be indicative of the sense or facet. We propose a clustering algorithm that can effectively leverage the two phenomena to automatically mine the major subtopics of queries, where each subtopic is represented by a cluster containing a number of URLs and keywords. The mined subtopics of queries can be used in multiple tasks in web search and we evaluate them in aspects of the search result presentation such as clustering and re-ranking. We demonstrate that our clustering algorithm can effectively mine query subtopics with an F1 measure in the range of 0.896-0.956. Our experimental results show that the use of the subtopics mined by our approach can significantly improve the state-of-the-art methods used for search result clustering. Experimental results based on click data also show that the re-ranking of search result based on our method can significantly improve the efficiency of users' ability to find information. © 2012 ACM.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval

DOI

Publication Date

September 28, 2012

Start / End Page

305 / 314
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Hu, Y., Qian, Y., Li, H., Jiang, D., Pei, J., & Zheng, Q. (2012). Mining query subtopics from search log data. In SIGIR’12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 305–314). https://doi.org/10.1145/2348283.2348327
Hu, Y., Y. Qian, H. Li, D. Jiang, J. Pei, and Q. Zheng. “Mining query subtopics from search log data.” In SIGIR’12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 305–14, 2012. https://doi.org/10.1145/2348283.2348327.
Hu Y, Qian Y, Li H, Jiang D, Pei J, Zheng Q. Mining query subtopics from search log data. In: SIGIR’12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012. p. 305–14.
Hu, Y., et al. “Mining query subtopics from search log data.” SIGIR’12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 2012, pp. 305–14. Scopus, doi:10.1145/2348283.2348327.
Hu Y, Qian Y, Li H, Jiang D, Pei J, Zheng Q. Mining query subtopics from search log data. SIGIR’12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012. p. 305–314.

Published In

SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval

DOI

Publication Date

September 28, 2012

Start / End Page

305 / 314