Mashing up the web is fun to do and it is possible to un-earth relationships and data meaning my applying a new angle which could be as simple as plotting the data on a map.
The problem we have is finding relevant data as user generated content is very up to date but not necessarily high quality. This kept journalists wary about the whole web2.0 thing for quite a while. However for some the read-write web is an opportunity rather than a bunch of amateurs trying to take away "real" journalists' jobs. The Guardian now puts a massive stake in the ground that shows this.
The Guardian Open Platform is contains two main items: the Data Store, which is a repository of all the public data gathered by The Guardian for their newspaper analysis pieces and the Content API which is a search over all the Guardian content.
In essence both of these pieces give you access to all the data that you just read online or in the newspaper and play with it yourself.
The Content API is a restful API that allows you to search data either with a full text search or by tag. Other than most of the APIs out there these tags are not user generated but instead are added to the content by professional journalists. That way it is immensely easy to filter down to the content you want by applying more and more filters to your search.
The easiest way of doing this is by using the API explorer to add more and more filters by clicking the tags on the right. For example there are 30005 results for "environment":
Clicking and applying the tag filters on the right for "politics,greenpolitics,environment,climate-change,uk,transport" gives us 64 results:
These tags are also available as arrays in each result, too, so you don't need to filter them in the API explorer by hand.
The API is fully rest based and returns data in XML, ATOM and JSON. There are demo SDKs available in Java, PHP, Ruby and Python. Currently the Open Platform is available as a public beta and you have to apply for a developer key that will be issued by hand, so give them some time to do so.
Once you are in, sky is the limit though and as one of the demo applications for the launch I mixed the API with BOSS using YQL to build a news mixer mashup. I will go into details about this in another blog post.
The data geek in me is very excited about the opportunities this new API and data store brings and the Yahoo in me already started writing the Open Data Tables for YQL to mix and match it easily with our services. So I say cheers for yet another great data resource to play with and hack on:
Yahoo Developer Network