Could be a cool YQL app if there is no "Redirected to a robots.txt restricted URL" on many urls

In the past couple days, my javascript app has been getting the robots.txt restricted url errors frequently on multiple websites including google and yahoo. My app is using the yql public endpoint and would make no more than 100 requests in an hour. An example of the error is below. I would very much like to submit my app to your cool YQL app search post if I could get it working. The app has been stable for a while til very recently. Your help is much appreciated!

"http://www.yahooapis.com/v1/base.rng" <yahoo:count=>"0" <yahoo:created=>"2013-08-27T23:11:40Z" <yahoo:lang=>"en-US"> true <![CDATA[http://www.google.com/robots.txt]]> <![CDATA[http://www.google.com/robots.txt]]> An error caused the engine to disallow robots for this domain <![CDATA[http://www.google.com/ig/calculator?q=BRL]]> 4007 8010 39253 <results/>

  • EA
  • Aug 27, 2013
2 Replies
  • Just saw a similar issue from another post with the suggestion that the source has disallowed bots from indexing a url. In the example above, I did check the robots.txt from google and the url is not on the disallow list. Similar situation with this url

    An error caused the engine to disallow robots for this domain

    the source's robots policy (finance.yahoo.com/robots.txt) do not have any listed restriction on the url I requested.

  • Seems like everything is running fine now. I have not hit a robots.txt error in the past 2 days with no code change on my part.

    I see the potential with YQL and would like to write more apps using the service. However, stability is a main issue and I hope the YQL team can make Qos a top priority. At this point, I think YQL is great for experimental projects, not ready for prime time yet.


Recent Posts

in YQL