Parse XML using Ruby

The typical way to parse XML in Ruby is to use REXML, which comes as part of the standard library.

Using REXML

In order to illustarte how to use REXML to parse XML data returned by Yahoo! APIs, let us try to extract data returned by a Yahoo! web search:

require 'net/http'
require 'rexml/document'

# Web search for "madonna"
url = 'http://api.search.yahoo.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=madonna&results=2'

# get the XML data as a string
xml_data = Net::HTTP.get_response(URI.parse(url)).body

# extract event information
doc = REXML::Document.new(xml_data)
titles = []
links = []
doc.elements.each('ResultSet/Result/Title') do |ele|
   titles << ele.text
end
doc.elements.each('ResultSet/Result/Url') do |ele|
   links << ele.text
end

# print all events
titles.each_with_index do |title, idx|
   print "#{title} => #{links[idx]}\n"
end

The above code will print out all the titles and links of all results.

Using XmlSimple

Not everyone finds the REXML API intuitive. XmlSimple is the Ruby port of the Perl XML::Simple module.

Let's try the same web search example using XmlSimple.

require 'net/http'
require 'rubygems'
require 'xmlsimple'

url = 'http://api.search.yahoo.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=madonna&results=2'
xml_data = Net::HTTP.get_response(URI.parse(url)).body

data = XmlSimple.xml_in(xml_data)

data['Result'].each do |item|
   item.sort.each do |k, v|
      if ["Title", "Url"].include? k
         print "#{v[0]}" if k=="Title"
         print " => #{v[0]}\n" if k=="Url"
      end
   end
end

XmlSimple "slurps" the XML data and converts it to a native Ruby data structure.

Further reading

Yahoo Forum Discussions