A methodology for estimating interdomain Web traffic demand
This paper introduces a methodology for estimating interdomain Web traffic flows between all clients worldwide and the servers belonging to over one thousand content providers. The idea is to use the server logs from a large Content Delivery Network (CDN) to identify client downloads of content provider (i.e., publisher) Web pages. For each of these Web pages, a client typically downloads some objects from the content provider, some from the CDN, and perhaps some from third parties such as banner advertisement agencies. The sizes and sources of the non-CDN downloads associated with each CDN download are estimated separately by examining Web accesses in packet traces collected at several universities. The methodology produces a (time-varying) interdomain HTTP traffic demand matrix pairing several hundred thousand blocks of client IP addresses with over ten thousand individual Web servers. When combined with geographical databases and routing tables, the matrix can be used to provide (partial) answers to questions such as "How do Web access patterns vary by country?", "Which autonomous systems host the most Web content?", and "How stable are Web traffic flows over time?". Copyright 2004 ACM.