0

Newbie YQL question

Hi,

I'm trying to parse an HTML page using an Open Data Table.
This is my xml file.

CODE
<?xml version="1.0" encoding="UTF-8"?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
<meta>
<author>me</author>
<description>html parser</description>
</meta>
<bindings>
<select itemPath="" produces="XML" >
<urls>
<url>http://myhtmlpage.com/</url>
</urls>
</select>
</bindings>
</table>


In the YQL console I'm typing:
CODE
use "mydatasource.xml" as datasource; select * from datasource;


The problem is that YQL returns null. If somebody can help please post.

Thank you!

by
7 Replies
  • LE: I replaced the actual links and names with dummy filenames. So this isn't an issue.
    0
  • Luci, by default YQL requires the sources it retrieves to be valid XML or JSON. Therefore you can't specify an HTML page in your open table in your example unless the HTML is well-formed XML.

    YQL does however offer the specialized html table which would allow you to retrieve any HTML sources and transform them into XML/JSON (select * from html where url='http://foo.com/a.html')
    0
  • Changed my table to this one:
    CODE
    <?xml version="1.0" encoding="UTF-8"?>
    <table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
    <meta>
    <author>me</author>
    <description>parser</description>
    </meta>
    <bindings>
    <select produces="XML" >
    <execute>
    var query = 'select * from html where url=\"http://url.com/\" and xpath=\"/html/body/table/tr[4]/td/table/tr/td[5]/table\"';
    var pageData = y.query(query);
    response.object = pageData;
    </execute>
    </select>
    </bindings>
    </table>


    and I get this result:
    CODE
    <results>
    <result>
    <diagnostics>
    <_>org.mozilla.javascript.UniqueTag@1505f70: NOT_FOUND</_>
    </diagnostics>
    <results>
    <_>org.mozilla.javascript.UniqueTag@1505f70: NOT_FOUND</_>
    </results>
    </result>
    </results>


    Any suggestions?

    Thanks!
    0
  • Hi Luci, what is the content on the page that you are trying to access? Most likely it'll just be a small xpath adjustment.

    - Jon
    0
  • Hi Jon,

    Thank you very much for your patience. I finally managed to get it working.

    Here is the open data table:

    CODE
    <?xml version="1.0" encoding="UTF-8"?>
    <table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
    <meta>
    <author>me</author>
    <description>parser</description>
    </meta>
    <bindings>
    <select produces="XML" >
    <execute><![CDATA[
    var x = y.rest("http://webpage.com").accept("text/html").get().response;
    var livedata = y.xpath(x, "/html/body/table/tr[4]/td/table/tr/td[5]/table");
    response.object = livedata;
    ]]></execute>
    </select>
    </bindings>
    </table>


    The lack of "<![CDATA[]]>" was the BIG problem. All I have to do now is find a way to parse the resulting table using E4X. Is there a specific way to do this? What's the type of the resulting "livedata" object after the xpath function is applied?

    Thanks.
    0
  • OK, I've managed to parse it but the output object isn't valid XML. Any ideas on how to make it valid XML?

    CODE
    <?xml version="1.0" encoding="UTF-8"?>
    <table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
    <meta>
    <author>me</author>
    <description>parser</description>
    </meta>
    <bindings>
    <select produces="XML" >
    <execute><![CDATA[
    var x = y.rest("http://me.com").accept("text/html").get().response;
    var livedata = y.xpath(x, "/html/body/table/tr[4]/td/table/tr/td[5]/table");
    var output = new XML();
    for each (row in livedata.tr){
    if (row.td[0].*.length() != 0) {
    output += <row>{row.*}</row>;
    }
    }
    response.object = output;
    ]]></execute>
    </select>
    </bindings>
    </table>
    0
  • OK. I solved this too:
    CODE
    <?xml version="1.0" encoding="UTF-8"?>
    <table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
    <meta>
    <author>me</author>
    <description>parser</description>
    </meta>
    <bindings>
    <select produces="XML" >
    <execute><![CDATA[
    var x = y.rest("http://me.com").accept("text/html").get().response;
    var livedata = y.xpath(x, "/html/body/table/tr[4]/td/table/tr/td[5]/table");
    var output = new XML();
    for each (row in livedata.tr){
    if (row.td[0].*.length() != 0) {
    output += <row>{row.*}</row>;
    }
    }
    response.object = output;
    ]]></execute>
    </select>
    </bindings>
    </table>
    0
  • Sorry, I forgot to change the response.object line. Now it's the right one!
    CODE
    <?xml version="1.0" encoding="UTF-8"?>
    <table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
    <meta>
    <author>me</author>
    <description>parser</description>
    </meta>
    <bindings>
    <select produces="XML" >
    <execute><![CDATA[
    var x = y.rest("http://me.com").accept("text/html").get().response;
    var livedata = y.xpath(x, "/html/body/table/tr[4]/td/table/tr/td[5]/table");
    var output = new XML();
    for each (row in livedata.tr){
    if (row.td[0].*.length() != 0) {
    output += <row>{row.*}</row>;
    }
    }
    response.object = <table>output</table>;
    ]]></execute>
    </select>
    </bindings>
    </table>
    0

Recent Posts

in YQL