Limiting Access to Content Provider Data

Content or API providers can opt out or restrict YQL access to their data by following the instructions in the sections below. Please remember that it is your responsibility to obtain the necessary permissions from the content or API providers to use their content or services, separate from your use of YQL: neither Yahoo nor your use of YQL cover those permissions.

Blocking HTML Data Scraping from YQL

YQL uses the robots.txt file on your server to determine the Web pages accessible from your site. YQL uses the user-agent "Yahoo Pipes 2.0" when accessing the robots.txt file and checks it for allows/disallows from this user agent. If the robots.txt check does prevent YQL from accessing your content, it will then fetch the target page using a different user agent:

Therefore, to deny YQL access to your content, simply add "Yahoo Pipes 2.0" to the relevent parts of your robots.txt. For example:

Another approach is to block YQL on your Web server. For example, in Apache, add this to your virtual host block in httpd.conf:

Blocking Non-HTML from YQL

YQL fetches content from URLs when requested by a developer. Because YQL is not a Web crawler, it does not follow the robots exclusion protocol for non-HTML data, such as XML or CSV, from a site. To stop YQL from accessing any content on your site, block the YQL user-agent (Yahoo Pipes 2.0) on your Web server.

For example, on Apache servers, add this rule to your virtual host block in httpd.conf:

Rate Limiting by IP Address

YQL allows APIs to accurately use IP-based rate limits that will track and count on the YQL developer's IP address, rather than the IP addresses of shared proxy servers that YQL uses to access content on the Web.

For outgoing requests to external content and API providers, YQL determines the last valid client IP address connecting to its Web service and then ensures this is the first IP address in the X-FORWARDED-FOR HTTP header.

For example, in the X-FORWARDED-FOR HTTP header below, the request arriving at YQL came from the IP address. IP-rate limiters should use this value rather than the IP addresses of YQL proxy servers.


We also set the CLIENT-IP HTTP header to this IP address.

For example:



Because these headers are "unsigned," they can be spoofed. Therefore, providers should only use these headers if the proxy setting them is trusted. The IP addresses of the proxy hosts that should be trusted can be found at This file will be updated as our proxy hosts change.

Table of Contents