Back to YQL Forum
Robots.txt Not Applied Correctly
This YQL query: select * from html where url = 'http://digg.com/users'
Returns <forbidden>robots.txt for that domain disallows crawling for that url</forbidden>
Which is not true. The robots.txt file does not disallow access to the /users directory. It disallows access for agent "Referrer Karma", but agent * is allowed access to the /users directory.
Yahoo Pipes has no problem accessing pages in that directory, so I'm guessing there's a bug with the way YQL is looking at the user agents in robots.txt..
Mar 14, 2009
Mar 16, 2009
Thanks for the report. The bug has been filed and closed and will be fixed in the next YQL release.
Login to reply
Follow us on