0

Console Works, But direct link reports robots.txt error ...

This is a permalink to the YQL console and my query:

YQL Console

When I click "Test" - it works.

When I do a "Copy Url" and paste that url diretly in browser - it says:

QUOTE
<error yahoo:lang="en-US">

<diagnostics>
<publiclyCallable>true</publiclyCallable>

<forbidden>
robots.txt for that domain disallows crawling for that url
</forbidden>
</diagnostics>
<description>Error Retrieving Data from External Service</description>
</error>


I tried that url on many other computers(other IPs) and it is not working!? How is that posible? Also, that site does not have robots.txt at all?
What is going on?

by
11 Replies
  • No, Sory ... this is the right link to a console:

    CONSOLE
    0
  • Now it works BOTH WAYS!!!
    What is going on?
    0
  • QUOTE (Deda Miloje @ Aug 27 2009, 04:15 AM) <{POST_SNAPBACK}>
    Now it works BOTH WAYS!!!
    What is going on?


    If you see a similar problem w/ another site please do let us know.

    -- Nagesh
    0
  • Why you need some other url? This one is perfect for testing.

    Today I had same problem, but yesterday everything worked!?
    What is going on, how is this even possible!? What is the difference between Console and URL?
    0
  • QUOTE (Deda Miloje @ Aug 29 2009, 04:33 AM) <{POST_SNAPBACK}>
    Why you need some other url? This one is perfect for testing.

    Today I had same problem, but yesterday everything worked!?
    What is going on, how is this even possible!? What is the difference between Console and URL?


    The console is calling the web service, so you should no differences. The query you provided does appear to work fine in both web service and console.

    Jonathan
    0
  • Most of the time both methods work, but i am having problems in the afternoon hours CET (for example right now - it public-api does not work). Also this site does not have robots.txt.

    Like you said - there SHOULD be no diference between console and public-api and thats a THEORY. But, trust me, I have no reason to lie, there IS a diference between those two, so that's a FACT.

    Someone have any idea how and why?
    0
  • QUOTE (Deda Miloje @ Sep 2 2009, 06:11 AM) <{POST_SNAPBACK}>
    Most of the time both methods work, but i am having problems in the afternoon hours CET (for example right now - it public-api does not work). Also this site does not have robots.txt.

    Like you said - there SHOULD be no diference between console and public-api and thats a THEORY. But, trust me, I have no reason to lie, there IS a diference between those two, so that's a FACT.

    Someone have any idea how and why?


    I am having the same issue. Here is what I know.

    - When accessing my website from my home, I have never had this error even after thousands of test calls. When accessing my website from my computer at work, I get this error at least half the time.

    - The error occurs only during business hours EST, although this is the only time I am at work so this may be coincidental.

    - Sometimes the $.getJSON call takes exactly 30 seconds to fail, sometimes it fails immediately on call. If I wait 30 minutes or so it will start working again.

    - Any other domain I access from the same page with YQL works fine. I have only one remote domain that causes this error.

    - Same query always works fine in console.

    - Same results with both IE and FF.

    - Robots.txt on remote domain has no exclusions.

    I believe this has something to do with the network my browser is on, although this sounds very odd and I am at a loss as to why this might be the case. The evidence however seems to support.

    Here is the response:

    CODE
      <?xml version="1.0" encoding="UTF-8" ?> 
    - <query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="0" yahoo:created="2010-02-08T06:27:30Z" yahoo:lang="en-US" yahoo:updated="2010-02-08T06:27:30Z" yahoo:uri="http://query.yahooapis.com/v1/yql?q=select+*+from+html+where+url%3D%22+http%3A%2F%2Fremote_domain.com%22">
    - <diagnostics>
    <publiclyCallable>true</publiclyCallable>
    <forbidden>robots.txt for the domain disallows crawling for url: http://remote_domain.com</forbidden>
    <user-time>3</user-time>
    <service-time>0</service-time>
    <build-version>4265</build-version>
    </diagnostics>
    <results />
    </query>
    - <!-- total: 4
    -->
    - <!-- yqlengine3.pipes.mud.yahoo.com uncompressed/chunked Mon Feb 8 10:27:30 PST 2010
    -->
    0
  • I am still experiencing this error. YQL is reporting that access is denied by robots.txt

    To document it, I created a logging function to record the date/time of calls and success/failure. The results over the last 11 days are:

    From my home: 1,123 calls, 100% sucess
    From my work: 1,586 calls, 638 success, 946 fail
    From other: 18 calls, 15 success, 3 fail

    I have the log files if a developer would like to see them.

    My network at work is AOL corporate. My network at home Earthlink cable.

    The robots.txt file contains:

    CODE
    User-agent: *  # directed to all robots
    Disallow: /offendeduser.asp
    Disallow: /events.asp
    Disallow: /prayerlist.asp
    Disallow: /stoCheckoutPage1.asp
    Disallow: /stoCart.asp
    0
  • Update: Two more weeks of logs and YQL is still incorrectly reporting robots.txt denial for about 50% of calls.
    0
  • i have same problem in my site ukash its not indexed only main page index amd problem in robot txt i use a program to create robot txt i think is not working perfectly
    0
  • works for me: http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%20%3D%20%22http%3A%2F%2Fwww.ukashtrwin.com%2F%22&diagnostics=true
    0

Recent Posts

in YQL