Geolocating web sites with Yahoo! Placemaker

Yahoo Placemaker is a new web service from the Yahoo Geo team. It is a location extractor service that returns all the known geographical locations from text, RSS feed, or an HTML document.

It is pretty easy to use the service and here's a quick step-by-step guide to extract the locations of a web site using PHP and cURL. You can see the demo and try it out for yourself.

In order to get the locations from a web URL you need to call the API endpoint http://wherein.yahooapis.com using POST and provide four parameters:

  • the location of the web document as documentURL, for example http://uk.yahoo.com,
  • the documentType which could be text/plain, text/html, text/xml, text/rss, application/xml or application/xml+rss,
  • the format of the document you want back (XML or RSS) as outputType and
  • your application ID.

In PHP using cURL it looks like this:


<?php
$key = 'PASTE YOUR API KEY HERE';
$apiendpoint = 'http://wherein.yahooapis.com/v1/document';
$url = 'http://uk.yahoo.com';
$inputType = 'text/html';
$outputType = 'xml';
$post = 'appid='.$key.'&documentURL='.$url.
'&documentType='.$inputType.'&outputType='.$outputType;
$ch = curl_init($apiendpoint);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$results = curl_exec($ch);
header('content-type:text/xml');
echo $results;
?>

You can try it out for yourself and you'll see that the returned XML has all kinds of information explained in detail in the docs.

The interesting bits are really the places that the API found, and they are listed in the placeDetails element containing another place element:

placemaker place details by  you.

Using simplexml you can easily turn these into PHP objects and loop over them. Be aware that the XML uses CDATA around the content, so simplexml_load_string needs to be told that.


<?php
$key = 'PASTE YOUR API KEY HERE';
$apiendpoint = 'http://wherein.yahooapis.com/v1/document';
$url = 'http://uk.yahoo.com';
$inputType = 'text/html';
$outputType = 'xml';
$post = 'appid='.$key.'&documentURL='.$url.
'&documentType='.$inputType.'&outputType='.$outputType;
$ch = curl_init($apiendpoint);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$results = curl_exec($ch);
$places = simplexml_load_string($results, 'SimpleXMLElement',
LIBXML_NOCDATA);
echo '<h2>Results</h2>';
if($places->document->placeDetails){
echo '<table>';
echo '<caption>Locations for '.$url.'</caption>';
echo '<thead>';
echo '<th scope="row">Name</th>';
echo '<th scope="row">Type</th>';
echo '<th scope="row">woeid</th>';
echo '<th scope="row">Latitude</th>';
echo '<th scope="row">Longitude</th>';
echo '</thead>';
echo '<tbody>';
foreach($places->document->placeDetails as $p){
echo '<tr>';
echo '<td>'.$p->place->name.'</td>';
echo '<td>'.$p->place->type.'</td>';
echo '<td>'.$p->place->woeId.'</td>';
echo '<td>'.$p->place->centroid->latitude.'</td>';
echo '<td>'.$p->place->centroid->longitude.'</td>';
echo '</tr>';
}
echo '</tbody></table>';
} else {
echo '<h2>Couldn't find any locations for '.$url.'</h2>';
}
?>

You can try this out for yourself too, and adding a few more lines of code and a bit of styling makes it easy to use for any URL.

Another interesting part of the XML is the referenceList which gives you all the matches Placemaker found in the document and in the case of an HTML document, the XPATH pointing to them.

Happy Hacking!

Chris Heilmann
Yahoo Developer Network