The problem exists in other domains, also. I thought that the cause could be the crawler being banned by robots.txt, but it turns out that when making a select from something below some path which is disallowed in robot.txt, the error message is different, as shown in the following examples.
A) Domain:
www.greasespot.netrobots.tx:
User-agent: *
Disallow: /search1) select * from html where url="http://www.greasespot.net"
Works fine.
2) select * from html where url="http://www.greasespot.net/search"
Returns "Error Retrieving Data from External Service"
<forbidden>
robots.txt for that domain disallows crawling for that url</forbidden>
B) Domain:
userscripts.orgrobots.tx:
User-agent: *
Disallow: /scripts/source/
Disallow: /scripts/version/
Disallow: /scripts/diff/
Disallow: /users
Disallow: /reviews/new
Disallow: /posts/preview1) select * from html where url="http://userscripts.org/"
url error="Server returned
HTTP response code: 500 for URL:
http://userscripts.org/" execution-time="86" http-status-code="500" http-status-message="Internal Server Error"><![CDATA[http://userscripts.org/]]></url>
2) select * from html where url="http://userscripts.org
/scripts/version/"
<forbidden>
robots.txt for that domain disallows crawling for that url</forbidden>
C) Domain:
lang-8.conrobots.txt:
User-Agent: *
Allow: /1) select * from html where url="http://lang-8.com/"
<url error="Server returned
HTTP response code: 500 for URL:
http://lang-8.com/" execution-time="12" http-status-code="500" http-status-message="Internal Server Error"><![CDATA[http://lang-8.com/]]></url>