Widgetbox, YQL, and Pipes

The tools we offer at Widgetbox allow publishers and casual developers to quickly turn structured data like RSS or ATOM feeds into Web widgets and mobile sites. However, despite the number of blogs and CMS instances, not all publishers have access to their own data in an easy-to-use format like XML. Small companies — such as independent retail shops and restaurants — still rely on static websites.

Yahoo! provides the tools to bridge the gap between static or unstructured content and Widgetbox tools, such as our blog widget and Mobile Site Builder, which require structured content. Yahoo! Query Language (YQL) is a single REST endpoint with a SQL-like syntax for accessing data from large variety of web services.

YQL also exposes a collection of data manipulation tables that make working with existing data, such as HTML, much more concise than traditional screen-scraping techniques.

Suppose a restaurant has a menu on its website with the following HTML structure:

Code for the menu

The following YQL query selects all of the content (*) from the list item tags in any unordered list with the class "menu" using a XPATH selector.

select * from html where url="http://myrestaurant.com/menu.html" and

The result is a XML result-set that is one step closer to being transformed into a widget or mobile site.

Using YQL html data table to screen scrape a static web page:

Code that can be transformed more easily

While the YQL XML results could be transformed through a content proxy using XSLT, Yahoo! Pipes already provides the rest of the tools needed to transform the raw XML into nicely formatted RSS. When you combine the YQL query and the Create RSS Pipes operator, and then map the paragraph and span tags from each menu item to the title and description elements of an RSS feed item, you transform the static HTML menu into an RSS feed.

Updating the HTML menu by hand updates the RSS feed automatically.

Rewiring YQL raw XML into RSS

The Open Data Tables collection enables developers or content providers to open access to their Web-based data through YQL. Open Data Tables helps to unify API endpoints that can change between API versions, and it simplifies syntax by using the YQL query syntax. While YQL provides a wealth of data and access to many Yahoo! APIs by default, Open Data Tables allows for a mapping between YQL and any web service.

For instance, continuing with our restaurant example, Yelp already provides RSS feeds for restaurant reviews. Any of those RSS feeds could be plugged directly into a Widgetbox blog widget or a mobile site feed page. However, a restaurant will most likely want to feature its best reviews, While Yelp ratings are inserted into the title of each review, YQL queries and Pipes manipulations makes highlighting the positive reviews much easier.

If you enable community data tables and enter the following query, you search for the term "Horizon restaurant" with review ratings four stars or better, in San Francisco.

select reviews from yelp.review.search where term='horizon restaurant'
and reviews.rating > 3 and location='San Francisco, ca' and

The result is YQL XML with more information than the regular Yelp review feed.

Searching for positive Yelp reviews with YQL

Again, using Pipes to map the reviews to RSS items makes the result usable in any feed-based Widgetbox product.

Converting Yelp reviews to RSS

Creating a mobile site for our example restaurant turns their otherwise static web site into a mobile site with just a few more clicks.

Mobile restaurant menuMobile restaurant menu

For comparison, here's the original static web site:

Static restaurant web site

Guest post by Jeff Remer, Software Engineer, Widgetbox