A short adventure with Yahoo! Pipes

Editor's Note: Warwickshire County Council in the UK recently began to open up their local public data, as part of a wider Open Government Data intiative being spearheaded by Tim Berners-Lee. This Guest Post post by Graham Hyde describes his entry in the Hack Warwickshire competition, which opened on May 1, in search of exciting applications, web sites, mashups, and visualizations built using Warwickshire Open Data. Graham is an Applications Database Analyst, Warwickshire County Council, and you can email Graham. See the Editor's Notes for information on Yahoo! Pipes.

It comes to something when you weep tears of joy, not sorrow, after plotting a Warwickshire-shaped group of points on a map in the middle of the Indian Ocean. That’s what you get it you mix up latitude and longitude. And equally, that’s what you get if you all but recently believed a “mashup” was just something you did with tea.

As this accurately described me, I was attracted to Yahoo! Pipes because I didn’t need to learn a new programming language to get started. Yahoo! Pipes claims to be “a powerful composition tool to aggregate, manipulate, and mashup content from around the web.” And you can do it all without writing any code. It is clever stuff, but alas I soon came down after the initial wow. Or maybe it just wasn’t the tool best suited to the way I decided to tackle the brief, which was to develop an application that used one of more of the Warwickshire County Council Open Datasets in a new and innovative way.

I decided to look at some of the schools data to see what could be joined up and plotted on a map. The obvious choices were these:

School data

While the datasets had some duplicated information, such as address, Warwickshire Schools contained the latitude and longitude coordinates and Warwickshire Schools (Detail) didn’t. With my limited skills in web development, the challenge was to join the datasets and present the details on the map. This seemed to be an achievable target.

How Yahoo! Pipes Works

It’s fairly straightforward. Working with the Pipe Editor, you drag and drop the modules you want from the Module Library onto the Canvas, and you check your progress using the Debugger (shown below). You usually start with a source module, then do things to it by adding various other modules and linking their terminals together with “Pipes” to form a flow. As you click on a module, the output from it appears in the Debugger. To get the Pipe to run, you need connect up your final module to the Pipe Output module.

I suspect it’s my lack of expertise in this area but I struggled to grasp what it was that was actually flowing through the Pipes.

It is data, obviously, but potentially all sorts of different types of data. Pipes seem to convert most of these into a sequence of items, which I likened to records. And each record contains what could be considered to be fields, but there’s probably a grander name for them.

Certain fields seem to have special meaning to Yahoo! Pipes, yet try as I might, I couldn’t find a list of these magic fields anywhere that told you what Pipes would do with them. However, the more examples you look at, the more you find out about it all.

The key fields seem to be: title, description, and link. And bizarrely, y:location. There are probably good reasons for these names, so I’m sure someone will tell me sometime. But I haven’t been able to find out or work it out. I assume they’re some generic fields that Web feeds tend to contain.

Editor's Note Here's information that could have helped Graham: The required elements (title, description, link, and so on) are described in the RSS2 spec. This is because RSS is the most common output Pipes produces (other
outputs include JSON, CSV, KML, iCal, and serialized PHP). We also have y: namespacing to normalize various syndication data structures (atom, RDF, various RSS versions) and for our own special cases such as y:location (which produces geoRSS tags geo:lat and geo:long).

My Warwickshire Schools Application

To run the following app, go to my Warwickshire Schools URL.

Map showing schools

You tell it what phase of school you’re interested in, and it shows you where those schools are. The school name is displayed as the item title and the details as the item description. The school name provides a link to a page about that school on the Warwickshire County Council website.

In addition to the map view, a list is view is also available, showing title (school name) and description (details). Again, the school name is a link.

List view of schools

alt="Pipes Editor" src="https://s.yimg.com/lq/i/ydn/blog/pies-editor.jpg" width="415" height="263"/>

The Fetch CSV module gets the data, which is then filtered depending on the user input value. Various fields are then renamed. A loop is used to build a description field, although the HTML tags I included to format this failed to work.

And now it’s time to confess.

Lessons Learnt

I cheated. Well, I think I did. I spent a lot of time trying to join together more that one dataset but failed to do so. With Yahoo! Pipes you can merge data feeds, but not actually join them together. Taking this application as an example, I found I could combine the two datasets, but I ended up with two items for each school, one with school location data and the other with school detail data.

What I wanted was to join them based on some matching data found in both datasets, such as school number or school name. It didn’t seem to be possible, and despite searching forums for solutions, I found none I could use. There seemed to be plenty of other people with the same problem, but the only suggested work around was a kludge that meant repeated reading the one of the data sources. I was briefly excited by the possibility of using YQL (Yahoo! Query Language) to join the data, because it says it can do it. But it actually can’t: A YQL join is something completely different.

To achieve anything useful my only option was to copy the datasets and join the data myself, producing a new dataset which I stored on my own webspace. However, this gave me the opportunity to add in some new data not available in the datasets currently available: that is, a link to the school Web page on the WCC website. Without this, I wasn’t able to use one of the key features of Yahoo! Pipes.

However, in producing the new dataset, I was surprised to find that the two WCC datasets I was using didn’t agree on how many schools there are in Warwickshire. Some schools appear in one dataset but not in the other. And some of the names are subtly different. This was disappointing and, clearly, a lot of work can be saved if the data could go through some sort of cleansing process before being made available. I guess that will come with time.

With my limited knowledge in this area, it seems to me that Yahoo! Pipes is less about presenting data and more about gathering data together and producing a summarized feed for use elsewhere. At least there seemed to be little more I could do to improve the presentation. The really impressive examples written by others impressed because of what they managed to achieve with the data sources they used. In that sense it’s disappointing not to be able to have joined together the datasets I used.

The glimpse I had of YQL was interesting, but it feels like it’s a programmable version of Yahoo! Pipes. It will be interesting to see how both products develop as open data becomes popular.

Editor's Note You may find it helpful to read the following blog posts on pipes and YQL and using YQL Execute to power pipes. For more information on Pipes and news on the forthcoming V2 Pipes engine, see the Pipes blog. Beta testers can email their questions, with contact info, to Pipes Questions.

One Final Wow!

To demonstrate how the Pipe can be used elsewhere, here’s an example that provides KML output.

Pipes KML output

You get that just by pasting this URL to the Pipe KML feed into the Google Maps search field:

http://pipes.yahoo.com/pipes/pipe.run?_id=76c0d7cb35668ed87e9f4f8a6f2a6a89&_render=kml&phase=Secondary

Acknowledgements

I’m grateful to the following sources for the help they gave me in understanding Yahoo! Pipes: