Feed following: The big data challenge in social applications
Internet users spend billions of minutes per month on sites like Facebook and Twitter. These sites support feed following, where users "follow" activity streams associated with other users and entities. Followers get personalized feeds that blend streams produced by those followed. The emphasis on recency and relevance, and the highly variable fan-out of the follows graph, make this feature difficult to implement at the scale seen in major social networks. In this paper, we place feed following in the context of existing research areas and highlight the novel data management challenges that it poses, with the goal of stimulating research in this new direction. We discuss solutions based on pub/sub, caching, and materialized views, and argue that none of these existing approaches fully exploit the unique characteristics of feed following. The number of distinct queries and the query rate per second that a feed following system must support are huge, but queries have simple structure and overlap. The system must handle high throughput input streams, but results are heavily biased toward recent events. The number of users is large, but they exhibit diurnal behavior, and we can dynamically modify the system to optimize for currently active users. These characteristics offer many opportunities for optimization, and the potential gains are substantial. Copyright © 2011 ACM.