0

Twitter Search returns valid ATOM; YQL declares it unfit for consumption

Hello, everyone.

First off, I'd like to say that I'm very impressed with the potential of YQL, so I'd love to get some help in making it work with Twitter Search, which it just doesn't want to at the moment.

I'm using the following query:

use 'http://www.icanhaslayout.com/twitter.search.xml' as twitter.search;
select * from twitter.search where q='twitter';

The contents of that XML file are as follows:

<?xml version="1.0" encoding="UTF-8"?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
<meta>
<author>Steven Merrill</author>
<documentationURL>http://thecodemill.biz/searchmonkey/twitter.user.profile</documentationURL>
<sampleQuery>select * from {table} where q='twitter'</sampleQuery>
</meta>
<bindings>
<select itemPath="feed.entry" produces="XML">
<urls>
<url>http://search.twitter.com/search.atom</url>
</urls>
<inputs>
<key id="q" type="xs:string" paramType="query" required="true"/>
</inputs>
</select>
</bindings>
</table>

I have verified that Twitter Search is returning valid ATOM, yet YQL refuses to parse it, and I get errors like the following:

<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="0" yahoo:created="2009-03-11T06:58:47Z" yahoo:lang="en-US" yahoo:updated="2009-03-11T06:58:47Z" yahoo:uri="http://query.yahooapis.com/v1/yql?q=use+%27http%3A%2F%2Fwww.icanhaslayout.com%2Ftwitter.search.xml%27+as+twitter.search%3B%0Aselect+*+from+twitter.search+where+q%3D%27twitter%27%3B">
<diagnostics>
<url execution-time="10">http://www.icanhaslayout.com/twitter.search.xml</url>
<publiclyCallable>true</publiclyCallable>
<error>Invalid XML document http://search.twitter.com/search.atom?q=tw...r</error>
<url execution-time="1">http://search.twitter.com/search.atom?q=twitter</url>
<url execution-time="6" http-status-code="200" http-status-message="OK">http://search.twitter.com/search.atom?q=twitter</url>
<user-time>69</user-time>
<service-time>17</service-time>
<build-version>911</build-version>
</diagnostics>
<results/>
</query>

Can you shed any light on why YQL chokes on valid ATOM data?

Thanks in advance for the help!

by
5 Replies
  • When this happens it is because we are getting HTML back from Twitter due to them blocking us. They had told us they whitelisted our IP addresses so we'll follow up with them and see if there is an issue.

    Thanks,
    Sam
    0
  • Sam,

    Thanks for the quick reply - the service looks awesome. :)Sam
    0
  • the problem is still there, any progress on the subject ?
    0
  • QUOTE (yotamatudai @ Mar 30 2009, 11:32 PM) <{POST_SNAPBACK}>
    the problem is still there, any progress on the subject ?


    There has been a lot of progress but its not terribly encouraging:

    We asked Twitter about the issue, that they are returning a fairly weird looking response now and again:

    CODE
    HTTP/1.1 200 OK
    Date: Wed, 11 Feb 2009 19:02:25 GMT
    Server: ---
    Content-Type: text/html; charset=UTF-8
    Cache-Control: max-age=300
    Expires: Wed, 11 Feb 2009 19:07:25 GMT
    Content-Length: 122
    Vary: Accept-Encoding
    X-Varnish: 1848104217
    Age: 0
    X-Cache-Svr: searchweb001.twitter.com
    X-Cache: MISS
    Set-Cookie:
    _search_twitter_sess=BAh7...;
    path=/

    Status: 500 Internal Server Error
    Content-Type: text/html


    You can see they appear to be having an internal server error to our XML request, which they are returning inside an HTML response as a 200 OK status. Not pretty.

    The good news was that they told us they would get the issue fixed.... and things seemed to work again - yay. .... And then it happened again. We contacted them again, and they said they'd get it fixed.... yay!... and then it happened again...

    We have since contacted them again and they have said they will get it fixed... but you can see the pattern.

    In the meantime, depending on your needs, gnip provides a nice way of monitoring twitter, e.g.:

    select * from gnip.activity where publisher='twitter'

    ...gives you everything happening in the last minute. You can go back up to 1 hour using the "bucket" parameter, see http://docs.google.com/Doc?id=dpw6zj9_0fdcnttgd#Buckets

    Jonathan
    0
  • QUOTE
    We have since contacted them again and they have said they will get it fixed... but you can see the pattern.

    In the meantime, depending on your needs, gnip provides a nice way of monitoring twitter, e.g.:

    select * from gnip.activity where publisher='twitter'

    ...gives you everything happening in the last minute. You can go back up to 1 hour using the "bucket" parameter, see http://docs.google.com/Doc?id=dpw6zj9_0fdcnttgd#Buckets

    Jonathan


    I am also experiencing difficulties in getting data from Twitter. I tried
    CODE
    select * from atom where url='http://search.twitter.com/search.atom?q=obama'
    Which give back a 200, and Invalid XML document

    Also the suggested detour through Gnip comes back empty....

    Any other suggestions to link Twitter into the wonderfull world of YQL?

    Rene
    0
  • Recent Posts

    in YQL