0

robots.txt restriction not working anymore?

Does YQL still uses robots.txt?

For example, I was testing this query: select * from html where url="http://seoyourblog.com/"

http://query.yahooapis.com/v1/public/yql?q...rblog.com%2F%22

It seems to fetch the page, even though http://seoyourblog.com/robots.txt has the following lines:

User-agent: Yahoo Pipes 1.0
Disallow: / User-agent: Yahoo Pipes 2.0
Disallow: /

by
6 Replies
  • Looks like an issue, we'll take a look at it and get it fixed ASAP.
    0
  • I took a look at the robots.txt, seems like the current version (http://seoyourblog.com/robots.txt) has the Yahoo Pipes 2.0 User-agent on the previous Disallow line, thus we are not recognizing Yahoo Pipes 2.0 as a valid user-agent:

    User-agent: Yahoo Pipes 1.0
    Disallow: / User-agent: Yahoo Pipes 2.0
    Disallow: /

    Seems like there should be a new line between the disallow and the next user-agent definition, like so:

    User-agent: Yahoo Pipes 1.0
    Disallow: /
    User-agent: Yahoo Pipes 2.0
    Disallow: /

    Not sure if you are the site owner for this domain but it looks like that is the reason traffic is being allowed by YQL.

    --Josh
    0
  • Thanks for the answer Josh :)
    0
  • As a side note, for example this query: select * from html where url="http://siteriver.com/" is being correctly blocked. The robots.txt instructions regarding Yahoo Pipes seem to be identical.
    0
  • Seems like your star (*) User-agent Allow is conflicting with your Pipes Disallows. The "/" path is both allowed by the star User-agent and disallowed by the Pipes agents it seems like an ambiguous configuration. Try removing the allow "/" on the star User-agent.

    We cache the robots.txt for 1 hour but if you pass in a query parameter of debug=true, we will bypass the cache and load the robots each time. That should help with the cache issues during development. This section in the developers guide about debugging also applies to the robots.txt:

    http://developer.yahoo.com/yql/guide/yql-n...rk-logging.html
    0
  • It finally worked, thank you Josh :)
    0

Recent Posts

in YQL