As with most techniques for web data collection, we first had to determine how much real traffic was coming in. Every web site is constantly visited by search engine crawlers, bots, and spam, and the Yahoo! network is no exception.
We took a combination of access logs and beacon data (previously included in the page) and filtered out all of the automated requests, leaving us with a set of requests we could confirm were sent by actual users. This data, which is completely anonymous, gave us a good indication of traffic patterns in several countries.
It is important to point out that Yahoo! sites in different countries receive differing amounts of traffic from varying locations, so making generalizations around user populations is difficult. Also, U.S.-based Yahoo! sites receive a significant amount of traffic from outside of the U.S., so that number is influenced just as much by visitors from outside of the U.S. as it is from visitors inside.