Robots.txt Not Applied Correctly

This YQL query: select * from html where url = 'http://digg.com/users'

Returns <forbidden>robots.txt for that domain disallows crawling for that url</forbidden>

Which is not true. The robots.txt file does not disallow access to the /users directory. It disallows access for agent "Referrer Karma", but agent * is allowed access to the /users directory.

Yahoo Pipes has no problem accessing pages in that directory, so I'm guessing there's a bug with the way YQL is looking at the user agents in robots.txt..

1 Reply
  • Thanks for the report. The bug has been filed and closed and will be fixed in the next YQL release.


Recent Posts

in YQL