0

robots.txt error is back

YQL has been working perfectly on my web site for several months, but yesterday I started getting the Requesting a robots.txt restricted URL error message again even though robots.txt does not restrict YQL.

One minute everything works perfectly, a minute later the robots.txt error occurs. Once the error starts, all calls will return the error for an hour or so.

I previously had this exact same issue when I was calling YQL directly from JavaScript in the browser, so I redesigned my app to call YQL from curl. Using curl 'fixed' the problem, since I had not had this error even once using curl. As of yesterday the error is back.

There are many threads here reporting the same issue, and none of the responses address the issue other than to say "it's working now." My previous posts on this subject never received a developer response.

This is a simple issue - Either robots.txt excludes YQL or it doesn't, and YQL shouldn't be schizophrenic about it.

BTW - The site I am having trouble accessing with YQL is a Yahoo! web site.

by
12 Replies
  • QUOTE (mark @ Jul 16 2010, 07:09 AM) <{POST_SNAPBACK}>
    YQL has been working perfectly on my web site for several months, but yesterday I started getting the Requesting a robots.txt restricted URL error message again even though robots.txt does not restrict YQL.

    One minute everything works perfectly, a minute later the robots.txt error occurs. Once the error starts, all calls will return the error for an hour or so.

    I previously had this exact same issue when I was calling YQL directly from JavaScript in the browser, so I redesigned my app to call YQL from curl. Using curl 'fixed' the problem, since I had not had this error even once using curl. As of yesterday the error is back.

    There are many threads here reporting the same issue, and none of the responses address the issue other than to say "it's working now." My previous posts on this subject never received a developer response.

    This is a simple issue - Either robots.txt excludes YQL or it doesn't, and YQL shouldn't be schizophrenic about it.

    BTW - The site I am having trouble accessing with YQL is a Yahoo! web site.



    Hi Mark,

    Can you share the URL you're trying to reach. Any example query would be great. We'll investigate and provide a fix.

    thanks,
    Nagesh
    0
  • QUOTE (Nagesh Susarla @ Jul 16 2010, 08:58 AM) <{POST_SNAPBACK}>
    Hi Mark,

    Can you share the URL you're trying to reach. Any example query would be great. We'll investigate and provide a fix.

    thanks,
    Nagesh


    Hi Nagesh.

    Here is the query in the console. I just checked and it is failing right now.

    console query
    0
  • QUOTE (mark @ Jul 16 2010, 09:54 AM) <{POST_SNAPBACK}>
    Hi Nagesh.

    Here is the query in the console. I just checked and it is failing right now.

    console query


    Hi Mark,

    I'll need a bit more information since I'm able to get the response w/o issues from the console. Can you run the following curl on a shell and tell me what server you hit?

    CODE
    curl "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Frivals.com%22&diagnostics=true"


    You should get something which looks like the following at the end of the response

    CODE
    <!-- yqlengine1.pipes.sp1.yahoo.com uncompressed/chunked Fri Jul 16 16:46:46 PDT 2010 -->


    -- Nagesh
    0
  • QUOTE (Nagesh Susarla @ Jul 16 2010, 03:49 PM) <{POST_SNAPBACK}>
    Hi Mark,

    I'll need a bit more information since I'm able to get the response w/o issues from the console. Can you run the following curl on a shell and tell me what server you hit?

    CODE
    curl "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Frivals.com%22&diagnostics=true"


    -- Nagesh


    Nagesh,

    Here is the response.

    CODE
    <!-- yqlengine1.pipes.re4.yahoo.com uncompressed/chunked Fri Jul 16 17:19:10 PDT 2010 -->
    0
  • This error is occurring again today. YQL is incorrectly reporting robots.txt exclusion.

    This query works:

    CODEBOX
    select * from html where url="http://www.example.com"


    This query does not work:

    CODEBOX
    select * from html where url="http://subdomain.example.com"


    There is no robots.txt on subdomain.example.com and the YQL console response is:

    CODEBOX
    error="Requesting a robots.txt restricted URL: http://subdomain.example.com/"
    0
  • I am also having this issue now.<br>Sometimes, as if randomly, I get the robots.txt error when I know this was not an error before.<br>When it comes, it seems to stay for a while on URLS I have had working and know should work fine (no robots.txt).<br><br>&nbsp;<pre style="background-color:#EEEEEE;overflow-x:auto;overflow-y:hidden;"><pre class="xml" id="prexml" style="margin-top:0px;margin-bottom:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;line-height:13px;color:#000080;word-wrap:normal;text-align:left;background-color:#ffffff;"><span class="sc3"><span class="re0" style="color:#008080;">error</span><span class="sy0" style="color:#000000;">=</span><span class="st0" style="color:#dd1144;">&quot;Redirected to a robots.txt restricted URL: http://www.fabstuff.net/catalog/145&quot;<br></span></span></pre></pre><br>On trying the same URLs after this issue happens, it goes away and instead I get the following error, even though the site is clearley still accessible in my browser - so I know the errors from YQL are wrong!<br><br>&nbsp;<pre style="background-color:#EEEEEE;overflow-x:auto;overflow-y:hidden;"><pre class="xml" id="prexml" style="margin-top:0px;margin-bottom:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;line-height:13px;color:#000080;word-wrap:normal;text-align:left;background-color:#ffffff;"><span class="sc3"><span class="st0" style="color:#dd1144;">Connect Failure</span></span></pre></pre><br>To test for yourself use:<br>use &quot;store://S1QsV6sU4JcsSNtDn9U8T1&quot; as fabstuff_search; select * from fabstuff_search where url = &#39;http://www.fabstuff.net/catalog/145&#39;;<br><br>Thanks - any reply much appreciated - so annoying as I can&#39;t do anything until I get past this ;-)
    0
  • A better / alternative query to check with is this one:<br>use &quot;store://Kl8rVJym92l5YxnnNqC88D&quot; as fabstuff; select * from fabstuff;<br><br>Although it may time out after 30 seconds on some attempts as I have switched off the cacheing for now.....
    0
  • I can&#39;t reproduce this error, I always get results.<br><br>It could be that fab.com has one of their servers configured incorrectly.<br><br><br>Thanks -Paul<br>YQL Team
    0
  • Hi Paul thanks for your reply - however it seems it is frustratingly working now! (it is an intermittent issue hence the post)<br>Soon it will start complaining about robots.txt again for no good reason.<br><br>Hopefully you can catch it when it does....?<br><br><div class="quote "><div class="quotetop ">QUOTE<cite>(Paul Donnelly @ 19 Mar 2012 1:45 PM)</cite><blockquote class="quotemain">I can&#39;t reproduce this error, I always get results.<br><br>It could be that fab.com has one of their servers configured incorrectly.<br><br><br>Thanks -Paul<br>YQL Team</blockquote></div></div>
    0
  • Its doing it with this query NOW!:<br>select * from html where url = &#39;http://www.fabstuff.net/products/682&#39;;
    0
  • <p class="p1"><span class="s1">&lt;?xml</span><span class="s2"> </span>version<span class="s1">=</span><span class="s3">&quot;1.0&quot;</span><span class="s2"> </span>encoding<span class="s1">=</span><span class="s3">&quot;UTF-8&quot;</span><span class="s1">?&gt;</span></p><p class="p2"><span class="s1">&lt;query</span><span class="s2"> </span><span class="s4">xmlns:yahoo</span><span class="s1">=</span>&quot;http://www.yahooapis.com/v1/base.rng&quot;</p><p class="p1"><span class="s2">&nbsp; &nbsp; </span>yahoo:count<span class="s1">=</span><span class="s3">&quot;0&quot;</span><span class="s2"> </span>yahoo:created<span class="s1">=</span><span class="s3">&quot;2012-03-19T22:03:01Z&quot;</span><span class="s2"> </span>yahoo:lang<span class="s1">=</span><span class="s3">&quot;en-US&quot;</span><span class="s1">&gt;</span></p><p class="p3"><span class="s2">&nbsp; &nbsp; </span>&lt;diagnostics&gt;</p><p class="p3"><span class="s2">&nbsp; &nbsp; &nbsp; &nbsp; </span>&lt;publiclyCallable&gt;<span class="s2">true</span>&lt;/publiclyCallable&gt;</p><p class="p4">&nbsp; &nbsp; &nbsp; &nbsp; <span class="s1">&lt;url</span></p><p class="p2"><span class="s2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </span><span class="s4">error</span><span class="s1">=</span>&quot;Redirected to a robots.txt restricted URL: http://www.fabstuff.net/products/682&quot;</p><p class="p1"><span class="s2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </span>execution-start-time<span class="s1">=</span><span class="s3">&quot;1&quot;</span><span class="s2"> </span>execution-stop-time<span class="s1">=</span><span class="s3">&quot;3&quot;</span></p><p class="p1"><span class="s2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </span>execution-time<span class="s1">=</span><span class="s3">&quot;2&quot;</span><span class="s2"> </span>http-status-code<span class="s1">=</span><span class="s3">&quot;403&quot;</span></p><p class="p4">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="s4">http-status-message</span><span class="s1">=</span><span class="s3">&quot;Forbidden&quot;</span> <span class="s4">proxy</span><span class="s1">=</span><span class="s3">&quot;DEFAULT&quot;</span><span class="s1">&gt;</span><span class="s5">&lt;![CDATA[</span>http://www.fabstuff.net/products/682<span class="s5">]]&gt;</span><span class="s1">&lt;/url&gt;</span></p><p class="p3"><span class="s2">&nbsp; &nbsp; &nbsp; &nbsp; </span>&lt;user-time&gt;<span class="s2">3</span>&lt;/user-time&gt;</p><p class="p3"><span class="s2">&nbsp; &nbsp; &nbsp; &nbsp; </span>&lt;service-time&gt;<span class="s2">2</span>&lt;/service-time&gt;</p><p class="p3"><span class="s2">&nbsp; &nbsp; &nbsp; &nbsp; </span>&lt;build-version&gt;<span class="s2">25587</span>&lt;/build-version&gt;</p><p class="p3"><span class="s2">&nbsp; &nbsp; </span>&lt;/diagnostics&gt;<span class="s2">&nbsp;</span></p><p class="p3"><span class="s2">&nbsp; &nbsp; </span>&lt;results/&gt;</p><p class="p3">&lt;/query&gt;</p>
    0
  • <div>And when it works (modified slightly so I can post on here):<br><div>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;<div>&lt;query xmlns:yahoo=&quot;http://www.yahooapis.com/v1/base.rng&quot;<div>&nbsp; &nbsp; yahoo:count=&quot;1&quot; yahoo:created=&quot;2012-03-20T07:36:08Z&quot; yahoo:lang=&quot;en-US&quot;&gt;<div>&nbsp; &nbsp; &lt;diagnostics&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;publiclyCallable&gt;true&lt;/publiclyCallable&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;url execution-start-time=&quot;1&quot; execution-stop-time=&quot;608&quot;<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; execution-time=&quot;607&quot; proxy=&quot;DEFAULT&quot;&gt;&lt;![CDATA[http://www.fabstuff.net/products/682]]&gt;&lt;/url&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;user-time&gt;618&lt;/user-time&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;service-time&gt;607&lt;/service-time&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;build-version&gt;25587&lt;/build-version&gt;<div>&nbsp; &nbsp; &lt;/diagnostics&gt;&nbsp;<div>&nbsp; &nbsp; &lt;results&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;body marginheight=&quot;0&quot; marginwidth=&quot;0&quot;&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;script src=&quot;menu_setting.js&quot; type=&quot;text/javascript&quot; xml:space=&quot;preserve&quot;/&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;script src=&quot;menu_arr_en.js&quot; type=&quot;text/javascript&quot; xml:space=&quot;preserve&quot;/&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;script src=&quot;menu_com.js&quot; type=&quot;text/javascript&quot; xml:space=&quot;preserve&quot;/&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;script type=&quot;text/javascript&quot; xml:space=&quot;preserve&quot;&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; function Go(){return}<div>&nbsp; &nbsp; &lt;/script&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;table align=&quot;center&quot; border=&quot;0&quot; cellpadding=&quot;0&quot;<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cellspacing=&quot;0&quot; width=&quot;972&quot;&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;tr valign=&quot;top&quot;&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;.....<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;/td&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;/tr&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;/table&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;/td&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; .......................................<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;/tr&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;/table&gt;<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;/body&gt;<div>&nbsp; &nbsp; &lt;/results&gt;<div>&lt;/query&gt;</div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div>
    0

Recent Posts

in YQL