yql mess up with content

I'm trying to get raw content of my gist so that I could display.

Here is the code:

function requestCrossDomain(url, cb) {

    yql = "<http://query.yahooapis.com/v1/public/yql?>" +
          "q=" + encodeURIComponent('select * from html where url="' + url + '" ') +

    $.getJSON(yql, function (data) {


The data.results[0] does contain something, but it's garbage. The code between <script></script> are separated by a <p> (in the render function) ??? Why? Where am I wrong?

Here is a jsfiddle, as you can see, only part of the code between <script> is shown here.

2 Replies
  • Hey Zhitao, I stumbled upon the same issue as yours. Did you find any solution? I'm thinking about using another scraper altogether.. Can you recommend one?

  • In my case the expected response is a JSON like this

    {"html":"<li class=\"cross\" onclick=\"var ie = ieCheck(); if(ie > -1 && ie <=8) window.event.cancelBubble = true;\">hey<\/li>"}

    YQL is instead returning a "messed up" value for the "html" key. The problem is due to the fact that YQL "correctly" tries to fix the value as if it was HTML because the response has an incorrect Content-Type set to text/html. The mess-up happens at the first unescaped ">" inside the onclick attribute, resulting in the following "interpretation" for the onclick:


    onclick=\"var ie = ieCheck(); if(ie > -1 && ie <=8) window.event.cancelBubble = true;\">hey


     ie="ieCheck();if(ie" onclick="\&quot;var"> -1 &amp;&amp; ie &lt;=8)window.event.cancelBubble = true;\"&gt;hey

    So, given that

    1. this is a JSONP request,
    2. and the mistaken site is a third party I have no relationship with,
    3. and there is no way I can make them send a correct Content-Type in a reasonable amount of time,
    4. and YQL is half correct (I understand that automatic correction may be useful for further processing by YQL but automatic correction is almost by definition what messes things up, unexpectedly like in this case, where wrong HTML is instead correctly rendered by a browser),
    5. and I wanted to use YQL

    a good solution would be to have YQL implement the possibility to configure the request URL so that we ALWAYS get the raw result, ignoring the Content-Type.

    What about

    ... and compat="raw"



Recent Posts

in YQL