Overview

In this Chapter:

What is YQL?

The YQL Web Service enables applications to query, filter, and combine data from different sources across the Internet. YQL statements have a SQL-like syntax, familiar to any developer with database experience. The following YQL statement, for example, retrieves geo data for Sunnyvale, CA:

select * from geo.places where text="sunnyvale, ca"

To access the YQL Web Service, a Web application can call HTTP GET, passing the YQL statement as a URL parameter, for example:

http://query.yahooapis.com/v1/public/yql?q=select * from geo.places where text="sunnyvale, ca"

When it processes a query, the YQL Web Service accesses a datasource on the Internet, transforms the data, and returns the results in either XML or JSON format. YQL can access several types of datasources, including Yahoo Web Services, other Web services, and Web content in formats such as HTML, XML, RSS, and Atom.

Why Use YQL?

The YQL Web Service offers the following benefits:

  • Because it resembles SQL, the syntax of YQL is already familiar to many developers. YQL hides the complexity of Web service APIs by presenting data as simple tables, rows, and columns.
  • YQL includes pre-defined tables for popular Yahoo Web services such as Flickr, Social, MyBlogLog, and Search.
  • YQL can access services on the Internet that output data in the following formats: HTML, XML, JSON, RSS, Atom, and microformat.
  • YQL is extensible, allowing you to define Open Data Tables to access datasources other than Yahoo Web Services. This feature enables you to mash up (combine) data from multiple Web services and APIs, exposing the data as a single YQL table.
  • You can choose either XML or JSON for the format of the results returned by requests to YQL.
  • YQL sub-selects enable you to join data from disparate datasources on the Web. YQL returns the data in a structured document, with elements that resemble rows in a table.
  • With YQL, you can filter the data returned with an expression that is similar to the WHERE clause of SQL.
  • When processing data from large tables, you can page through the query results.
  • The YQL Console enables you to run YQL statements interactively from your browser. The console includes runnable sample queries so that you can quickly learn YQL. For a quick introduction to the console, see the The Two-Minute Tutorial.

Usage Information and Limits

The following information describes the use, performance, dependencies, and limits of the YQL Web service. If you have additional questions, please read the YQL Terms of Service or send an email to yql-questions@yahoo-inc.com.

Usage Information:

  • YQL can be used for commercial purposes, with Yahoo approval.
  • If you would like to use YQL commercially, please contact us at yql-commercial [@] yahoo-inc.com and we will review you request.
  • Yahoo will notify users six months in advance with an announcement on this Web page and in our forum if it intends to discontinue or make backwards incompatible changes to the YQL Web Service.
  • YQL has a performance uptime target of over 99.5%.
  • YQL relies on the correct operation of the Web services and content providers it accesses.
  • YQL rate limits are subject to the rate limits of other Yahoo and 3rd-party Web services, and all rates are subject to change.

Rate Limits:

  Public OAuth with API Key
YQL Endpoint /v1/public/* /v1/yql/*
Hourly Cap 2,000 requests/hour per IP 20,000 requests/hour per IP
Daily Cap None 100,000 total requests/day per API Key

To better understand the rate limits in the above table, let’s use the following example. Suppose you create an application that generates around 3,000 requests from each user per hour. You notice though that users are capped at a maximum of 2,000 calls using the public endpoint, so you register an API Key with Yahoo to use the authenticated/authorized endpoint. Now your users can make 20,000 requests per hour and up to a total of 100,000 requests per day.

Limiting Access to Content Provider Data

Content or API providers can opt out or restrict YQL access to their data by following the instructions in the sections below. Please remember that it is your responsibility to obtain the necessary permissions from the content or API providers to use their content or services, separate from your use of YQL: neither Yahoo nor your use of YQL cover those permissions.

Blocking HTML Data Scraping from YQL

YQL uses the robots.txt file on your server to determine the Web pages accessible from your site. YQL uses the user-agent “Yahoo Pipes 2.0” when accessing the robots.txt file and checks it for allows/disallows from this user agent. If the robots.txt check does prevent YQL from accessing your content, it will then fetch the target page using a different user agent:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14

Therefore, to deny YQL access to your content, simply add “Yahoo Pipes 2.0” to the relevent parts of your robots.txt. For example:

User-agent: Yahoo Pipes 2.0
Disallow: /

Another approach is to block YQL on your Web server. For example, in Apache, add this to your virtual host block in httpd.conf:

SetEnvIfNoCase User-Agent "Yahoo Pipes" noYQL
<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=noYQL
</Limit>

Blocking Non-HTML from YQL

YQL fetches content from URLs when requested by a developer. Because YQL is not a Web crawler, it does not follow the robots exclusion protocol for non-HTML data, such as XML or CSV, from a site. To stop YQL from accessing any content on your site, block the YQL user-agent (Yahoo Pipes 2.0) on your Web server.

For example, on Apache servers, add this rule to your virtual host block in httpd.conf:

SetEnvIfNoCase User-Agent "Yahoo Pipes" noYQL
<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=noYQL
</Limit>

Rate Limiting by IP Address

YQL allows APIs to accurately use IP-based rate limits that will track and count on the YQL developer’s IP address, rather than the IP addresses of shared proxy servers that YQL uses to access content on the Web.

For outgoing requests to external content and API providers, YQL determines the last valid client IP address connecting to its Web service and then ensures this is the first IP address in the X-FORWARDED-FOR HTTP header.

For example, in the X-FORWARDED-FOR HTTP header below, the request arriving at YQL came from the 1.2.3.4 IP address. IP-rate limiters should use this value rather than the IP addresses of YQL proxy servers.

X-FORWARDED-FOR: 1.2.3.4, 5.6.7.8, 9.10.11.12

We also set the CLIENT-IP HTTP header to this IP address.

For example:

CLIENT-IP: 1.2.3.4

Note

Because these headers are “unsigned,” they can be spoofed. Therefore, providers should only use these headers if the proxy setting them is trusted. The IP addresses of the proxy hosts that should be trusted can be found at https://developer.yahoo.com/yql/proxy.txt. This file will be updated as our proxy hosts change.

Internationalization Support

Character Encoding

YQL supports most of the character sets in the IANA Character Sets Registery. YQL uses the HTTP header Content-Type in the request to determine the character encoding for the response body. If no character encoding is specified, YQL uses the default UTF-8. The YQL statement can also determine the character encoding for the body with the key charset. If the character encoding is specified in both places, the character set specified by charset has precedence.

For example, to request YQL use ISO/IEC 8859-1 to encode the response body, do one of the following:

  • In your request, set the HTTP header Content-Type as shown below:

    Content-Type: text/html; charset=iso-8859-1

  • In the YQL statement, specify the character set with the key charset as shown below:

    select * from html where url='http://example.com' and charset='iso-8559-1'

    Note

    The YQL built-in function sort only correctly sorts results in English.