Cache Yahoo! Web Service Calls using Ruby

One way of dramatically speeding up applications that are built using the Yahoo! Web Services APIs is to make heavy use of caching. With caching, calls that have been previously made in a specified time frame can be answered using cached data rather than making an API call over the network.

Caching Intro

The level of caching you use should take in to account the kind of data you are retrieving. You might be building an application that redisplays your Flickr photo sets on your own site. These sets change rarely, and you don't mind if you there is a delay of up to 12 hours before a new set appears on your site. Contrast this with redisplaying your most recent links from del.icio.us, where you might want them to show up on your own site straight away or within 5 or 10 minutes.

This HOWTO describes two methods of caching at the HTTP retrieval layer.

Caching in memory

Caching in memory is memoization with the addition of checking for freshness.

require 'net/http'

class MemFetcher
   def initialize
      # we initialize an empty hash
      @cache = {}
   end
   def fetch(url, max_age=0)
      # if the API URL exists as a key in cache, we just return it
      # we also make sure the data is fresh
      if @cache.has_key? url
         return @cache[url][1] if Time.now-@cache[url][0]<max_age
      end
      # if the URL does not exist in cache or the data is not fresh,
      #  we fetch again and store in cache
      @cache[url] = [Time.now, Net::HTTP.get_response(URI.parse(url)).body]
   end
end

Create an instance of the CacheFetcher class:

irb(main):001:0> require 'cache'
=> true
irb(main):002:0> fetcher = MemFetcher.new
=> #<Fetcher:0x4d6ec8 @cache={}>

Now retrieve a URL, specifying that it should not be retrieved it if it has been cached in the last 60 seconds:

irb(main):003:0> fetcher.fetch('http://search.yahooapis.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=madonna&results=10', 60)

If you try this in an interactive prompt there should be a short delay before the data is returned. Run the command again and the data will be returned instantly; it is already in the cache.

Caching to disk

MemFetcher is only useful for long-running Ruby programs, as the cache itself is stored in memory. Here is an alternative implementation that saves cached data to disk; this can be used by multiple Ruby processes.

require 'net/http'
require 'md5'

class DiskFetcher
   def initialize(cache_dir='/tmp')
      # this is the dir where we store our cache
      @cache_dir = cache_dir
   end
   def fetch(url, max_age=0)
      file = MD5.hexdigest(url)
      file_path = File.join("", @cache_dir, file)
      # we check if the file -- a MD5 hexdigest of the URL -- exists
      #  in the dir. If it does and the data is fresh, we just read
      #  data from the file and return
      if File.exists? file_path
         return File.new(file_path).read if Time.now-File.mtime(file_path)<max_age
      end
      # if the file does not exist (or if the data is not fresh), we
      #  make an HTTP request and save it to a file
      File.open(file_path, "w") do |data|
         data << Net::HTTP.get_response(URI.parse(url)).body
      end
   end
end

Usage is similar to the in-memory cache:

irb(main):001:0> require 'cache'
=> true
irb(main):002:0> fetcher = DiskFetcher.new
=> #<DiskFetcher:0x4d0424>
irb(main):003:0> fetcher.fetch('http://search.yahooapis.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=madonna&results=10', 60)

If no argument is provided to the DiskCacheFetcher constructor, the default temp directory will be used to store the cache files. On Unix-based systems, this is /tmp.

These functions can now be used in place of direct calls to Net::HTTP methods. This provides a simple but robust mechanism for caching API calls, speeding up your application and reducing the number of overall calls you have to make.

Yahoo Forum Discussions