I'd like to be able to get Yahoo Finance's information on a stock's Company, Exchange, Sector, Industry, # employees, and Historical Price Start & End dates all from one HTTP request.
Normally to get that data you need to fetch & scrape:
http://finance.yahoo.com/q?s=yhoo (to get Company & Exchange)
http://finance.yahoo.com/q/pr?s=yhoo (to get Sector, Industry, # Employees)
http://finance.yahoo.com/q/hp?s=yhoo (to get historical prices date range)
The following YQL queries get me closer to the correct result than just fetching the pages. To get the Company and Market:
CODE
select h2,span from html where url="http://finance.yahoo.com/q?s=YHOO" and xpath='//div[@class="yfi_quote_summary"]/div[1]'
Gives:
CODE
<div>
<h2>Yahoo! Inc.</h2>
<span>(NasdaqGS: YHOO)</span>
</div>
Using a Yahoo Pipe (with String Builder, String Replace, YQL, Rename, Regex modules, see here
http://pipes.yahoo.com/pipes/pipe.edit?_id...83hG6zDMEx0JrDg for the pipe) I was able to change that to:
* Market : NasdaqGS
* Symbol : YHOO
* FullName : Yahoo! Inc.
(Though I don't know how to programmatically run a pipe and get back the result. Also while my Pipe shows the correct results in the Debugger window, when you actually run the Pipe the List is empty?)
For the Sector, Industry, and # employees:
CODE
select * from html where url="http://finance.yahoo.com/q/pr?s=YHOO" and xpath='//table[@class="yfnc_datamodoutline1"][1]/tr/td/table/tr'
gives (along with a bunch of other stuff that I didn't really want. The xpath [1] predicate to limit the path to only the first matching table doesn't seem to work?):
CODE
<tr>
<td class="yfnc_tablehead1" width="50%">
<p>Sector:</p>
</td>
<td class="yfnc_tabledata1">
<a href="http://biz.yahoo.com/p/8conameu.html">Technology</a>
</td>
</tr>
<tr>
<td class="yfnc_tablehead1" width="50%">
<p>Industry:</p>
</td>
<td class="yfnc_tabledata1">
<a href="http://biz.yahoo.com/ic/851.html">Internet Information Providers</a>
</td>
</tr>
<tr>
<td class="yfnc_tablehead1" width="50%">
<p>Full Time Employees:</p>
</td>
<td class="yfnc_tabledata1">
<p>13,600</p>
</td>
</tr>
Using a Yahoo Pipe with Truncate & Tail modules I was able to limit the results to the above <tr>s. But a Loop over Item Builder doesn't seem to be able to assign item.td.p = item.td.a.content?
What I want is to have attributes with something like
Sector = Technology
Industry = Internet Information Providers
Full Time Employees = 13,600
Finally, I didn't get much farther than this for the Historical Prices range:
CODE
select * from html where url="http://finance.yahoo.com/q/hp?s=yhoo" and xpath='//td[@class="yfnc_formbody1"]'
Gives:
CODE
<td align="center" class="yfnc_formbody1" width="100%">
<table border="0" cellpadding="2" cellspacing="0">
<tr>
<td align="right" nowrap="nowrap">
<strong>Start Date:</strong>
</td>
<td align="center">
<select name="a">
<option value="00">Jan</option>
<option value="01">Feb</option>
<option value="02">Mar</option>
<option selected="selected" value="03">Apr</option>
<option value="04">May</option>
<option value="05">Jun</option>
<option value="06">Jul</option>
<option value="07">Aug</option>
<option value="08">Sep</option>
<option value="09">Oct</option>
<option value="10">Nov</option>
<option value="11">Dec</option>
</select>
</td>
<td align="center">
<input maxlength="2" name="b" size="2" type="text" value="12"/>
</td>
<td align="center">
<input maxlength="4" name="c" size="4" type="text" value="1996"/>
</td>
<td>
<small>Eg. Jan 1, 2003</small>
</td>
</tr>
<tr>
<td align="right" nowrap="nowrap">
<font face="arial" size="-1">
<strong>End Date:</strong>
</font>
</td>
<td align="center">
<select name="d">
<option value="00">Jan</option>
<option value="01">Feb</option>
<option value="02">Mar</option>
<option value="03">Apr</option>
<option value="04">May</option>
<option value="05">Jun</option>
<option selected="selected" value="06">Jul</option>
<option value="07">Aug</option>
<option value="08">Sep</option>
<option value="09">Oct</option>
<option value="10">Nov</option>
<option value="11">Dec</option>
</select>
</td>
<td align="center">
<input maxlength="2" name="e" size="2" type="text" value="30"/>
</td>
<td align="center">
<input maxlength="4" name="f" size="4" type="text" value="2009"/>
</td>
</tr>
</table>
</td>
And while I can that all the necessary information is there... I don't see any way with just YQL or Yahoo Pipes how to convert that to:
Start Date = Apr 12 1996
End Date = Jul 30 2009
Is what I am trying to do just not possible with YQL?
Even what I have already is better (I think) than just fetching the original URLs since less parsing needs to be done. But it would be nice if instead of doing three YQL queries and then using a regex to extract the needed info, I could somehow do one YQL query or maybe run a Yahoo Pipe that Unions the three queries and spits out the answer via HTTP.
Maybe I have to use Open Data Tables and execute Javascript? (Yikes!)