0

Open Data Table's execute Javascript quirks?

This is a very long message but in summary, I've noticed the following YQL quirks that I need help with:

  1. y.query(), select * from HTML sometimes puts in extra <p> tags.
  2. Periods where YQL gets Internal Server Error errors back from Yahoo Finance.
  3. String regex replacements gives TypeError: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.
  4. y.log(stock) seems to be able to display the future value of my stock variable.
  5. A y.log(stock) inside for each loops sometimes gives Exception: Can't find method log(string,function) errors.

Here's the details...

While I'm a novice at YQL, Open Data Tables, Javascript, E4X, XPath, and Firebug, I have managed to get some answers to my Using YQL to get summary information on a stock from Yahoo Finance via a single HTTP Get question. hapdaniel came up with a Pipe solution but here's my progress on using a YQL Open Data Table's execute Javascript sub-element.

It's definitely possible to use multiple y.query() statements inside a execute sub-element. Here's some technique's I used during my discovery process.

First of all, I made a very simple ODT that runs a YQL query on the Yahoo Finance Profile page for a stock. It just spits back the results of the y.query() (See it in the YQL Console):

CODE
<?xml version="1.0" encoding="UTF-8" ?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
<meta>
<description>Aggregates summary information for a stock on Yahoo Finance</description>
<sampleQuery>select * from stocks where ticker="yhoo"</sampleQuery>
</meta>
<bindings>
<select itemPath="" produces="XML">
<urls>
<url></url>
</urls>
<inputs>
<key id='ticker' type='xs:string' paramType='variable' required='true' />
</inputs>
<execute><![CDATA[

// Queue the query
var url = "http://finance.yahoo.com/q/pr?s="+ticker;
var profileQuery = y.query("select * from html where url=@url and xpath='//table[@class=\"yfnc_datamodoutline1\"]/tr/td/table/tr' limit 4" , {url:url});

// Get the Sector, Industry, Full Time Employees
var results = profileQuery.results;

response.object = results;

]]></execute>
</select>
</bindings>
</table>

I could have used y.log(results) to see the results in the diagnostics section, but that isn't formatted as nicely (entities used for < & > and no newlines).

I could have also used the <url> element and changed the paramType of my input <key> to path to automatically get the ticker symbol filled into the url. However, in the final ODT I'll need to query three urls and I haven't figured out how to get any but the last one back out of the <urls> element.

(I notice when running the query that it can take from 4-10 times as long to get my tiny ODT .xml file back from sites.google.com as it does to grab the profile page from Yahoo Finance or run a YQL query on it)

Since I'm not very adept at using E4X to extract information from XML, it is imperative that I can somehow interactively play with the query results. Therefore in Firefox, I browse to the same profile page and then open up Firebug. While it's easy to inspect any element on the page by clicking on it, it doesn't seem to be possible to then make an E4X XML object out of that node?

For example, I right-clicked on the relevant <tbody> node in the HTML tab, and picked Copy XPath. I then executed the following commands in the Console:

CODE
>>> detailsBody = $x("/html/body/div/div[3]/table[2]/tbody/tr[2]/td/table[2]/tbody/tr/td/table/tbody")
[ tbody ]
>>> detailsBody[0].firstChild.firstChild.textContent
"Index Membership:"
>>> xmlBody = new XML(detailsBody[0])
TypeError: can't convert new XML(detailsBody[0]) to XML

The Wikipedia entry on ECMAScript for XML has this to say:
QUOTE
Most E4X implementations don't have a means to directly import and export DOM nodes to/from the E4X model, although parsers can be made to handle the E4X.

so maybe it just isn't possible to do this.

Instead what I do is copy the <results> xml from the YQL Console and paste it into the Firebug, preceding it with "results=". This makes a E4X xml object that I can then manipulate directly from the Firebug Console.

CODE
>>> results..p
<p>Index Membership:</p>
<p>Sector:</p>
<p>Industry:</p>
<p>Full Time Employees:</p>
<p>13,600</p>

This illustrates the first quirk I've noticed about YQL. When querying from HTML it sometimes puts in extra <p> tags. While the original HTML looks like:

CODE
<tr>
<td class="yfnc_tablehead1" width="50%">Sector:</td>
<td class="yfnc_tabledata1">
<a href="http://biz.yahoo.com/p/8conameu.html">Technology</a>
</td>
</tr>

a YQL select * from html returns:

CODE
<tr>
<td class="yfnc_tablehead1" width="50%">
<p>Sector:</p>
</td>
<td class="yfnc_tabledata1">
<a href="http://biz.yahoo.com/p/8conameu.html">Technology</a>
</td>
</tr>

It isn't much of a problem but it does show the necessity of dumping out y.query() results instead of assuming you know what it's going to give you.

I eventually managed to come up with the following ODT to extract the Sector, Industry, and Full Time Employees from a Yahoo Finance Profile page (See it in the YQL Console):

CODE
<?xml version="1.0" encoding="UTF-8" ?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
<meta>
<description>Aggregates summary information for a stock on Yahoo Finance</description>
<sampleQuery>select * from stocks where ticker="yhoo"</sampleQuery>
</meta>
<bindings>
<select itemPath="" produces="XML">
<urls>
<url></url>
</urls>
<inputs>
<key id='ticker' type='xs:string' paramType='variable' required='true' />
</inputs>
<execute><![CDATA[

// Queue the query
var url = "http://finance.yahoo.com/q/pr?s="+ticker;
var profileQuery = y.query("select * from html where url=@url and xpath='//table[@class=\"yfnc_datamodoutline1\"]/tr/td/table/tr' limit 4" , {url:url});

// Get the Sector, Industry, Full Time Employees
var results = profileQuery.results;
var stock = <stock symbol={ticker}></stock>;
//y.log(stock);

for each (var tr in results.tr){
var property = tr.td[0].p.text().slice(0, -1).replace(/\s+/g, "");
switch (property.toLowerCase()) {
case 'indexmembership':
continue;
break;
case 'fulltimeemployees':
//value = tr.td[1].*.text().replace(/,/g, "");
value = tr.td[1].*.text();
break;
default:
value = tr.td[1].*.text();
break;
}
stock.appendChild(<{property}>{value}</{property}>);
}

response.object = stock;

]]></execute>
</select>
</bindings>
</table>

which normally returns:

CODE
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="1" yahoo:created="2009-08-02T12:49:36Z" yahoo:lang="en-US" yahoo:updated="2009-08-02T12:49:36Z" yahoo:uri="http://query.yahooapis.com/v1/yql?q=use+%22http%3A%2F%2Fsites.google.com%2Fsite%2Ftrialballoonproject%2Fyahoo.finance.stocks.profile.xml%22+as+stocks%3B%0Aselect+*+from+stocks+where+ticker%3D%22yhoo%22">
<diagnostics>
<publiclyCallable>true</publiclyCallable>
<url execution-time="285"><![CDATA[http://sites.google.com/site/trialballoonproject/yahoo.finance.stocks.profile.xml]]></url>
<url execution-time="31"><![CDATA[http://finance.yahoo.com/q/pr?s=yhoo]]></url>
<url execution-time="39"><![CDATA[select * from html where url=@url and xpath='//table[@class="yfnc_datamodoutline1"]/tr/td/table/tr' limit 4]]></url>
<javascript execution-time="42" instructions-used="16250" table-name="stocks"/>
<user-time>331</user-time>
<service-time>355</service-time>
<build-version>2426</build-version>
</diagnostics>
<results>
<stock symbol="yhoo">
<Sector>Technology</Sector>
<Industry>Internet Information Providers</Industry>
<FullTimeEmployees>13,600</FullTimeEmployees>
</stock>
</results>
</query>

but this weekend I often saw periods when I got nothing but:
"Server returned HTTP response code: 500 for URL: http://finance.yahoo.com/q/pr?s=yhoo" execution-time="3" http-status-code="500" http-status-message="Internal Server Error"><![CDATA[http://finance.yahoo.com/q/pr?s=yhoo]]></url>

messages. Hopefully it was a temporary issue with YQL.

In any case, the other quirks involve the FullTimeEmployees element. For some reason if I try to get rid of the comma by saying value = tr.td[1].*.text().replace(/,/g, "");, I get:

CODE
<log>&lt;stock symbol="yhoo"&gt; &lt;Sector&gt;Technology&lt;/Sector&gt; &lt;Industry&gt;Internet Information Providers&lt;/Industry&gt; &lt;/stock&gt;</log>
<javascript><![CDATA[Exception: TypeError: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified. (<javascript>#21)]]></javascript>

I can do the same exact regex in Firebug and it works fine. Also getting my property variable involves using a regex to remove all whitespace so it's not just regexes that don't work. I still haven't found the solution to this.

Secondly, look carefully at what the y.log() is printing out for the value of the <stock> element. Even though the y.log() statement is before the loop that generates the actual <stock> sub-elements, it is already showing the sub-elements made before the FullTimeEmployees error is hit! Very strange.

Finally, when I try to see what's going on by adding another y.log() inside my for each loop, I get the following different message:

CODE
<log>&lt;stock symbol="yhoo"&gt; &lt;Sector&gt;Technology&lt;/Sector&gt; &lt;Industry&gt;Internet Information Providers&lt;/Industry&gt; &lt;/stock&gt;</log>
<javascript><![CDATA[Exception: Can't find method log(string,function). (<javascript>#20)]]></javascript>

The y.log() before the loop still works fine but the second one fails?

Hopefully, this extremely long message detailing my progress with the Open Data Table execute Javascript sub-element will be of some help to others.

For learning E4X, I found Using E4X within YQL and The Mozilla Developer Center's E4X Tutorial and Processing XML with E4X helpful (although for some reason Firefox 3.0.11 has problems reliably getting the MDC pages and I had to resort to IE instead).

by
4 Replies
  • Arrgh! Even though my last post looks fine when Previewed, now all the paragraphs are one single long line???

    How can I fix that?
    0
  • Here's a manually wrapped version of my original message. Looks weird but at least you can see the entire paragraphs.

    This is a very long message but in summary, I've noticed the following YQL quirks that I need help with:

    1. y.query(), select * from HTML sometimes puts in extra <p> tags.
    2. Periods where YQL gets Internal Server Error errors back from Yahoo Finance.
    3. String regex replacements gives TypeError: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.
    4. y.log(stock) seems to be able to display the future value of my stock variable.
    5. A y.log(stock) inside for each loops sometimes gives Exception: Can't find method log(string,function) errors.

    Here's the details...

    While I'm a novice at YQL, Open Data Tables, Javascript, E4X, XPath,
    and Firebug, I have managed to get some answers to my
    Using
    YQL to get summary information on a stock from Yahoo Finance via a
    single HTTP Get
    question. hapdaniel came up with a Pipe solution
    but here's my progress on using a YQL Open Data Table's execute
    Javascript sub-element.

    It's definitely possible to use multiple
    y.query() statements inside a execute
    sub-element. Here's some technique's I used during my discovery
    process.

    First of all, I made a very simple ODT that runs a YQL query on the
    Yahoo Finance Profile page for a stock. It just spits back the results
    of the y.query() (See it in the
    YQL Console):

    CODE
    <?xml version="1.0" encoding="UTF-8" ?>
    <table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
    <meta>
    <description>Aggregates summary information for a stock on Yahoo Finance</description>
    <sampleQuery>select * from stocks where ticker="yhoo"</sampleQuery>
    </meta>
    <bindings>
    <select itemPath="" produces="XML">
    <urls>
    <url></url>
    </urls>
    <inputs>
    <key id='ticker' type='xs:string' paramType='variable' required='true' />
    </inputs>
    <execute><![CDATA[

    // Queue the query
    var url = "http://finance.yahoo.com/q/pr?s="+ticker;
    var profileQuery = y.query("select * from html where url=@url and xpath='//table[@class=\"yfnc_datamodoutline1\"]/tr/td/table/tr' limit 4" , {url:url});

    // Get the Sector, Industry, Full Time Employees
    var results = profileQuery.results;

    response.object = results;

    ]]></execute>
    </select>
    </bindings>
    </table>

    I could have used y.log(results) to see the
    results in the diagnostics section, but that isn't formatted as nicely
    (entities used for < & > and no newlines).

    I could have also used the <url> element and
    changed the paramType of my input
    <key> to path to
    automatically get the ticker symbol filled into the url. However, in
    the final ODT I'll need to query three urls and I haven't figured out
    how to get any but the last one back out of the
    <urls> element.

    (I notice when running the query that it can take from 4-10 times as
    long to get my tiny ODT .xml file back from sites.google.com as it
    does to grab the profile page from Yahoo Finance or run a YQL query on
    it)

    Since I'm not very adept at using E4X to extract information from XML,
    it is imperative that I can somehow interactively play with the query
    results. Therefore in Firefox, I browse to the same profile page and
    then open up Firebug. While it's
    easy to inspect any element on the page by clicking on it, it doesn't
    seem to be possible to then make an E4X XML object out of that node?

    For example, I right-clicked on the relevant
    <tbody> node in the HTML tab, and picked Copy
    XPath. I then executed the following commands in the Console:

    CODE
    >>> detailsBody = $x("/html/body/div/div[3]/table[2]/tbody/tr[2]/td/table[2]/tbody/tr/td/table/tbody")
    [ tbody ]
    >>> detailsBody[0].firstChild.firstChild.textContent
    "Index Membership:"
    >>> xmlBody = new XML(detailsBody[0])
    TypeError: can't convert new XML(detailsBody[0]) to XML

    The Wikipedia entry on
    ECMAScript for
    XML
    has this to say:

    QUOTE
    Most E4X implementations don't have a means to directly import and
    export DOM nodes to/from the E4X model, although parsers can be made
    to handle the E4X.
    so maybe it just isn't possible to do this.

    Instead what I do is copy the <results> xml
    from the YQL Console and paste it into the Firebug, preceding it with
    "results=". This makes a E4X xml object that I can then manipulate
    directly from the Firebug Console.

    CODE
    >>> results..p
    <p>Index Membership:</p>
    <p>Sector:</p>
    <p>Industry:</p>
    <p>Full Time Employees:</p>
    <p>13,600</p>

    This illustrates the first quirk I've noticed about YQL. When querying
    from HTML it sometimes puts in extra <p>
    tags. While the original HTML looks like:

    CODE
    <tr>
    <td class="yfnc_tablehead1" width="50%">Sector:</td>
    <td class="yfnc_tabledata1">
    <a href="http://biz.yahoo.com/p/8conameu.html">Technology</a>
    </td>
    </tr>

    a YQL select * from html returns:

    CODE
    <tr>
    <td class="yfnc_tablehead1" width="50%">
    <p>Sector:</p>
    </td>
    <td class="yfnc_tabledata1">
    <a href="http://biz.yahoo.com/p/8conameu.html">Technology</a>
    </td>
    </tr>

    It isn't much of a problem but it does show the necessity of dumping
    out y.query() results instead of assuming you
    know what it's going to give you.

    I eventually managed to come up with the following ODT to extract the
    Sector, Industry, and Full Time Employees from a Yahoo Finance Profile
    page (See it in the YQL Console):

    CODE
    <?xml version="1.0" encoding="UTF-8" ?>
    <table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
    <meta>
    <description>Aggregates summary information for a stock on Yahoo Finance</description>
    <sampleQuery>select * from stocks where ticker="yhoo"</sampleQuery>
    </meta>
    <bindings>
    <select itemPath="" produces="XML">
    <urls>
    <url></url>
    </urls>
    <inputs>
    <key id='ticker' type='xs:string' paramType='variable' required='true' />
    </inputs>
    <execute><![CDATA[

    // Queue the query
    var url = "http://finance.yahoo.com/q/pr?s="+ticker;
    var profileQuery = y.query("select * from html where url=@url and xpath='//table[@class=\"yfnc_datamodoutline1\"]/tr/td/table/tr' limit 4" , {url:url});

    // Get the Sector, Industry, Full Time Employees
    var results = profileQuery.results;
    var stock = <stock symbol={ticker}></stock>;
    //y.log(stock);

    for each (var tr in results.tr){
    var property = tr.td[0].p.text().slice(0, -1).replace(/\s+/g, "");
    switch (property.toLowerCase()) {
    case 'indexmembership':
    continue;
    break;
    case 'fulltimeemployees':
    //value = tr.td[1].*.text().replace(/,/g, "");
    value = tr.td[1].*.text();
    break;
    default:
    value = tr.td[1].*.text();
    break;
    }
    stock.appendChild(<{property}>{value}</{property}>);
    }

    response.object = stock;

    ]]></execute>
    </select>
    </bindings>
    </table>

    which normally returns:

    CODE
    <?xml version="1.0" encoding="UTF-8"?>
    <query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="1" yahoo:created="2009-08-02T12:49:36Z" yahoo:lang="en-US" yahoo:updated="2009-08-02T12:49:36Z" yahoo:uri="http://query.yahooapis.com/v1/yql?q=use+%22http%3A%2F%2Fsites.google.com%2Fsite%2Ftrialballoonproject%2Fyahoo.finance.stocks.profile.xml%22+as+stocks%3B%0Aselect+*+from+stocks+where+ticker%3D%22yhoo%22">
    <diagnostics>
    <publiclyCallable>true</publiclyCallable>
    <url execution-time="285"><![CDATA[http://sites.google.com/site/trialballoonproject/yahoo.finance.stocks.profile.xml]]></url>
    <url execution-time="31"><![CDATA[http://finance.yahoo.com/q/pr?s=yhoo]]></url>
    <url execution-time="39"><![CDATA[select * from html where url=@url and xpath='//table[@class="yfnc_datamodoutline1"]/tr/td/table/tr' limit 4]]></url>
    <javascript execution-time="42" instructions-used="16250" table-name="stocks"/>
    <user-time>331</user-time>
    <service-time>355</service-time>
    <build-version>2426</build-version>
    </diagnostics>
    <results>
    <stock symbol="yhoo">
    <Sector>Technology</Sector>
    <Industry>Internet Information Providers</Industry>
    <FullTimeEmployees>13,600</FullTimeEmployees>
    </stock>
    </results>
    </query>


    but this weekend I often saw periods when I got nothing but:

    "Server returned HTTP response code: 500 for
    URL: http://finance.yahoo.com/q/pr?s=yhoo" execution-time="3"
    http-status-code="500" http-status-message="Internal Server
    Error"><![CDATA[http://finance.yahoo.com/q/pr?s=yhoo]]></url>



    messages. Hopefully it was a temporary issue with YQL.

    In any case, the other quirks involve the FullTimeEmployees
    element. For some reason if I try to get rid of the comma by saying
    value = tr.td[1].*.text().replace(/,/g, "");, I
    get:

    CODE
    <log>&lt;stock symbol="yhoo"&gt; &lt;Sector&gt;Technology&lt;/Sector&gt; &lt;Industry&gt;Internet Information Providers&lt;/Industry&gt; &lt;/stock&gt;</log>
    <javascript><![CDATA[Exception: TypeError: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified. (<javascript>#21)]]></javascript>

    I can do the same exact regex in Firebug and it works fine. Also
    getting my property variable involves using a
    regex to remove all whitespace so it's not just regexes that don't
    work. I still haven't found the solution to this.

    Secondly, look carefully at what the y.log() is
    printing out for the value of the <stock>
    element. Even though the y.log() statement is
    before the loop that generates the actual
    <stock> sub-elements, it is already showing the
    sub-elements made before the FullTimeEmployees error is hit! Very
    strange.

    Finally, when I try to see what's going on by adding another
    y.log() inside my for
    each
    loop, I get the following different message:

    CODE
    <log>&lt;stock symbol="yhoo"&gt; &lt;Sector&gt;Technology&lt;/Sector&gt; &lt;Industry&gt;Internet Information Providers&lt;/Industry&gt; &lt;/stock&gt;</log>
    <javascript><![CDATA[Exception: Can't find method log(string,function). (<javascript>#20)]]></javascript>

    The y.log() before the loop still works fine
    but the second one fails?

    Hopefully, this extremely long message detailing my progress with the
    Open Data Table execute Javascript sub-element will be of some help to
    others.

    For learning E4X, I found
    Using
    E4X within YQL
    and The Mozilla Developer Center's
    E4X Tutorial
    and
    Processing
    XML with E4X
    helpful (although for some reason Firefox 3.0.11
    has problems reliably getting the MDC pages and I had to resort to IE
    instead).
    0
  • I figured out the "quirk" with y.log(). I must have been tired B) because you can only call y.log() with a single object or string, not a list of objects to display.
    0
  • Here was my problem with regexes. I was doing:
    value = tr.td[1].*.text().replace(/,/g, "");
    but what I really should have done was:
    value = tr.td[1].*.text().toString().replace(/,/g, "");

    x.text() where x is an XML object returns yet another XML object.
    You have to do x.text().toString() to get a string.

    The reason why tr.td[0].p.text().slice(0, -1).replace(/\s+/g, ""); works,
    is that the .slice() function must force the XML to be converted to a string first?
    0

Recent Posts

in YQL