Robots.txt Not Applied Correctly
This YQL query: select * from html where url = 'http://digg.com/users'
Returns <forbidden>robots.txt for that domain disallows crawling for that url</forbidden>
Which is not true. The robots.txt file does not disallow access to the /users directory. It disallows access for agent "Referrer Karma", but agent * is allowed access to the /users directory.
Yahoo Pipes has no problem accessing pages in that directory, so I'm guessing there's a bug with the way YQL is looking at the user agents in robots.txt..
by
1 Reply