Looks like some of my pipes that use XPath Fetch Page have started returning 403 Forbidden, somewhat inexplicably.
Unfortunately I have no idea when this started happening.
It does not seem to have anything to do with the user-agent or robots.txt file.
For instance, if I try a simple pipe:
XPath Fetch Page
Extract using XPath: [blank]
Use HTML5 parser: [not checked]
Emit items as string: [checked]
Error fetching <http://mit.edu/.> Response: Forbidden (403)
(Earlier today, I saw the same problem to other well-known servers, like nytimes.com and yahoo.com. But those problems have gone away now.)
curl -A "Yahoo Pipes 1.0" <http://mit.edu/>
works just fine.
Any advice? How can I debug this? Thank you!
Oops, sorry for not posting a link. Here: http://pipes.yahoo.com/pipes/pipe.edit?_id=145ca4a83c166fb7e5b9967ec065511b
For this case it could be that they are explicitly blocking Pipes.
They're not. At least, not with robots.txt and not with user-agent blocking and I suppose I can get someone to confirm there is not an IP block (any particular IP range I should be asking about), but that seems doubtful too?