1

Using YQL to get summary information on a stock from Yahoo Finance via a single HTTP Get

I'd like to be able to get Yahoo Finance's information on a stock's Company, Exchange, Sector, Industry, # employees, and Historical Price Start & End dates all from one HTTP request.

Normally to get that data you need to fetch & scrape:

http://finance.yahoo.com/q?s=yhoo (to get Company & Exchange)
http://finance.yahoo.com/q/pr?s=yhoo (to get Sector, Industry, # Employees)
http://finance.yahoo.com/q/hp?s=yhoo (to get historical prices date range)

The following YQL queries get me closer to the correct result than just fetching the pages. To get the Company and Market:

CODE
   select h2,span from html where url="http://finance.yahoo.com/q?s=YHOO" and xpath='//div[@class="yfi_quote_summary"]/div[1]'

Gives:

CODE
    <div>
<h2>Yahoo! Inc.</h2>
<span>(NasdaqGS: YHOO)</span>
</div>

Using a Yahoo Pipe (with String Builder, String Replace, YQL, Rename, Regex modules, see here http://pipes.yahoo.com/pipes/pipe.edit?_id...83hG6zDMEx0JrDg for the pipe) I was able to change that to:

* Market : NasdaqGS
* Symbol : YHOO
* FullName : Yahoo! Inc.

(Though I don't know how to programmatically run a pipe and get back the result. Also while my Pipe shows the correct results in the Debugger window, when you actually run the Pipe the List is empty?)

For the Sector, Industry, and # employees:

CODE
select * from html where url="http://finance.yahoo.com/q/pr?s=YHOO" and xpath='//table[@class="yfnc_datamodoutline1"][1]/tr/td/table/tr'

gives (along with a bunch of other stuff that I didn't really want. The xpath [1] predicate to limit the path to only the first matching table doesn't seem to work?):

CODE
    <tr>
<td class="yfnc_tablehead1" width="50%">
<p>Sector:</p>
</td>
<td class="yfnc_tabledata1">
<a href="http://biz.yahoo.com/p/8conameu.html">Technology</a>
</td>
</tr>
<tr>
<td class="yfnc_tablehead1" width="50%">
<p>Industry:</p>
</td>
<td class="yfnc_tabledata1">
<a href="http://biz.yahoo.com/ic/851.html">Internet Information Providers</a>
</td>
</tr>
<tr>
<td class="yfnc_tablehead1" width="50%">
<p>Full Time Employees:</p>
</td>
<td class="yfnc_tabledata1">
<p>13,600</p>
</td>
</tr>

Using a Yahoo Pipe with Truncate & Tail modules I was able to limit the results to the above <tr>s. But a Loop over Item Builder doesn't seem to be able to assign item.td.p = item.td.a.content?

What I want is to have attributes with something like
Sector = Technology
Industry = Internet Information Providers
Full Time Employees = 13,600

Finally, I didn't get much farther than this for the Historical Prices range:

CODE
select * from html where url="http://finance.yahoo.com/q/hp?s=yhoo" and xpath='//td[@class="yfnc_formbody1"]'

Gives:

CODE
    <td align="center" class="yfnc_formbody1" width="100%">
<table border="0" cellpadding="2" cellspacing="0">
<tr>
<td align="right" nowrap="nowrap">
<strong>Start Date:</strong>
</td>
<td align="center">
<select name="a">
<option value="00">Jan</option>
<option value="01">Feb</option>
<option value="02">Mar</option>
<option selected="selected" value="03">Apr</option>
<option value="04">May</option>
<option value="05">Jun</option>
<option value="06">Jul</option>
<option value="07">Aug</option>
<option value="08">Sep</option>
<option value="09">Oct</option>
<option value="10">Nov</option>
<option value="11">Dec</option>
</select>
</td>
<td align="center">
<input maxlength="2" name="b" size="2" type="text" value="12"/>
</td>
<td align="center">
<input maxlength="4" name="c" size="4" type="text" value="1996"/>
</td>
<td>
<small>Eg. Jan 1, 2003</small>
</td>
</tr>
<tr>
<td align="right" nowrap="nowrap">
<font face="arial" size="-1">
<strong>End Date:</strong>
</font>
</td>
<td align="center">
<select name="d">
<option value="00">Jan</option>
<option value="01">Feb</option>
<option value="02">Mar</option>
<option value="03">Apr</option>
<option value="04">May</option>
<option value="05">Jun</option>
<option selected="selected" value="06">Jul</option>
<option value="07">Aug</option>
<option value="08">Sep</option>
<option value="09">Oct</option>
<option value="10">Nov</option>
<option value="11">Dec</option>
</select>
</td>
<td align="center">
<input maxlength="2" name="e" size="2" type="text" value="30"/>
</td>
<td align="center">
<input maxlength="4" name="f" size="4" type="text" value="2009"/>
</td>
</tr>
</table>
</td>

And while I can that all the necessary information is there... I don't see any way with just YQL or Yahoo Pipes how to convert that to:
Start Date = Apr 12 1996
End Date = Jul 30 2009

Is what I am trying to do just not possible with YQL?

Even what I have already is better (I think) than just fetching the original URLs since less parsing needs to be done. But it would be nice if instead of doing three YQL queries and then using a regex to extract the needed info, I could somehow do one YQL query or maybe run a Yahoo Pipe that Unions the three queries and spits out the answer via HTTP.

Maybe I have to use Open Data Tables and execute Javascript? (Yikes!)

by
3 Replies
  • You're seeing no results when you run the pipe because the Pipes run page expects to see RSS 2.0 element names such as "title" and "description", which your pipe doesn't have. You could use the JSON, PHP or CSV options to output your elements.

    The Union module joins up your information, but as separate items, which I don't think is what you want.

    This pipe pulls together the information. I only tried it on the one stock symbol. You will need to tidy it up as required.
    http://pipes.yahoo.com/pipes/pipe.edit?_id...0b60b70079139c6
    0
  • Wow. I would never have figured that approach out. In particular the ${loop:yql1.4.content} ${loop:yql1.0.value} ${loop:yql1.1.value} idiom was new to me.

    Now that you have a working Pipe, is it possible to get the results via a single HTTP Get request?

    I had given up hope on a Pipe, so I am currently investigating the use of Open Data Table's execute Javascript command. I'll start another topic on my progress so far (and maybe you can help me with those issues also ;)http://pipes.yahoo.com/pipes/pipe.edit?_id...0b60b70079139c6
    0
  • Here's my Open Data Table solution to get a stock's Company Name, Exchange, Sector,
    Industry, # employees, and Historical Price Start & End dates all from one
    HTTP request to the YQL Web Service (behind the scenes it grabs three Yahoo Finance pages for the needed information):

    CODE
    <?xml version="1.0" encoding="UTF-8" ?>
    <table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
    <meta>
    <description>Aggregates summary information for a stock on Yahoo Finance</description>
    <sampleQuery>select * from stocks where ticker="yhoo"</sampleQuery>
    </meta>
    <bindings>
    <select itemPath="" produces="XML">
    <urls>
    <url></url>
    </urls>
    <inputs>
    <key id='ticker' type='xs:string' paramType='variable' required='true' />
    </inputs>
    <execute><![CDATA[
    // pad string with leading char
    function pad(s, padchar, padlen) {
    while (s.length < padlen) {
    s = padchar + s;
    }
    return s;
    }

    // Queue the queries
    var url = "http://finance.yahoo.com/q/pr?s="+ticker;
    var profileQuery = y.query("select * from html where url=@url and xpath='//table[@class=\"yfnc_datamodoutline1\"]/tr/td/table/tr' limit 4" , {url:url});

    var url = "http://finance.yahoo.com/q?s="+ticker;
    var quoteQuery = y.query("select h2,span from html where url=@url and xpath='//div[@class=\"yfi_quote_summary\"]/div[1]'" , {url:url});

    var url = "http://finance.yahoo.com/q/hp?s="+ticker;
    var historicalQuery = y.query("select * from html where url=@url and xpath='//option[@selected=\"selected\"] | //input[@maxlength=\"2\"] | //input[@maxlength=\"4\"]'" , {url:url});

    var stock = <stock symbol={ticker}></stock>;

    // First get the company name & market by looking at the Yahoo Quotes Summary page
    var results = quoteQuery.results;
    stock.CompanyName = results.div.h2.toString();
    var marketSymbolStr = results.div.span.toString();
    //var match = marketSymbolStr.match(/^[^:]+:\s+([^)]+)\)$/); matches ticker
    var match = marketSymbolStr.match(/^\(([^:]+):/);
    if (match != null) {
    stock.CompanyName += <Market>{match[1]}</Market>;
    }

    // Get the Historical Price Range
    var results = historicalQuery.results;
    startMonth = pad(String(parseInt(results.option[0].@value)+1), "0", 2);
    startDay = pad(results.input[0].@value.toString(), "0", 2);
    startYear = results.input[1].@value.toString();
    endMonth = pad(String(parseInt(results.option[1].@value)+1), "0", 2);
    endDay = pad(results.input[2].@value.toString(), "0", 2);
    endYear = results.input[3].@value.toString();

    startDate = startYear + "-" + startMonth + "-" + startDay;
    endDate = endYear + "-" + endMonth + "-" + endDay;

    stock.appendChild(<start>{startDate}</start>);
    stock.appendChild(<end>{endDate}</end>);

    // Get the Sector, Industry, Full Time Employees
    var results = profileQuery.results;
    for each (var tr in results.tr){
    //Remove trailing colon, and strip whitespace
    var property = tr.td[0].p.text().toString().slice(0, -1).replace(/\s+/g, "");
    switch (property.toLowerCase()) {
    case 'indexmembership':
    continue;
    break;
    case 'fulltimeemployees':
    //Strip commas
    value = tr.td[1].*.text().toString().replace(/,/g, "");
    break;
    default:
    //Convert whitespace to single space
    value = tr.td[1].*.text().toString().replace(/\s+/g, " ");
    break;
    }
    stock.appendChild(<{property}>{value}</{property}>);
    }

    response.object = stock

    ]]></execute>
    </select>
    </bindings>
    </table>

    (See it in the YQL Console)

    Unfortunately, the query to get the Yahoo Profile page often fails with a
    "500 Internal Server Error" message?

    Here's an example of it's use with the Python 2.5 Interpreter
    and the 3rd party lxml module:

    CODE
    >>> import urllib2
    >>> url = "http://query.yahooapis.com/v1/public/yql?q=use%20%22http%3A%2F%2Fsites.google.com%2Fsite%2Ftrialballoonproject%2Fyahoo.finance.stocks.xml%22%20as%20stocks%3B%0Aselect%20*%20from%20stocks%20where%20ticker%3D%22yhoo%22%0A&format=xml"
    >>> response = urllib2.urlopen(url)
    >>> pageData = response.read()
    >>> from lxml import etree
    >>> root = etree.fromstring(pageData)
    >>> stock=root.find(".//stock")
    >>> stock.get("symbol")
    'yhoo'
    >>> for property in stock:
    ... print "%s = %s" % (property.tag, property.text)
    ...
    CompanyName = Yahoo! Inc.
    Market = NasdaqGS
    Sector = Technology
    Industry = Internet Information Providers
    FullTimeEmployees = 13600
    start = 1996-04-12
    end = 2009-08-02
    >>>
    1
  • Recent Posts

    in YQL