Content or API providers can opt out or restrict YQL access to their data by following the instructions in the sections below.
YQL uses the robots.txt file on your server to determine the Web
pages accessible from your site. YQL uses the user-agent "Yahoo Pipes 2.0" when accessing the
robots.txt file and checks it for allows/disallows from this user agent. If
the robots.txt check does prevent YQL from accessing your content, it will
then fetch the target page using a different user agent:
Therefore, to deny YQL access to your content, simply add "Yahoo Pipes 2.0" to the
relevent parts of your robots.txt. For example:
Another approach is to block YQL on your Web server. For example, in Apache, add this to
your virtual host block in httpd.conf:
YQL fetches content from URLs when requested by a developer. Because YQL is not a Web crawler, it does not follow the robots exclusion protocol for non-HTML data, such as XML or CSV, from a site. To stop YQL from accessing any content on your site, block the YQL user-agent (Yahoo Pipes 2.0) on your Web server.
For example, on Apache servers, add this rule to your virtual host block in
httpd.conf:
YQL allows APIs to accurately use IP-based rate limits that will track and count on the YQL developer's IP address, rather than the IP addresses of shared proxy servers that YQL uses to access content on the Web.
For outgoing requests to external content and API providers, YQL determines the last
valid client IP address connecting to its Web service and then ensures this is the first IP
address in the X-FORWARDED-FOR HTTP header.
For example, in the X-FORWARDED-FOR HTTP header below, the request
arriving at YQL came from the 1.2.3.4 IP address. IP-rate limiters should
use this value rather than the IP addresses of YQL proxy servers.
X-FORWARDED-FOR: 1.2.3.4, 5.6.7.8, 9.10.11.12
We also set the CLIENT-IP HTTP header to this IP address.
For example:
CLIENT-IP: 1.2.3.4
Because these headers are "unsigned," they can be spoofed. Therefore, providers
should only use these headers if the proxy setting them is trusted. The IP addresses of
the proxy hosts that should be trusted can be found at
http://developer.yahoo.com/yql/proxy.txt. This file will be updated as
our proxy hosts change.