1

YQL Query returning invalid XML

I'm currently using the following YQL statement to scrape page content of sites. Where {EncodedURI} is the page I am scraping. Ex: http%3A%2F%2Fstackoverflow.com

 SELECT * FROM html WHERE url="{EncodedURI}"

For most URLs this works just fine. The XML is valid and my application carries on. But on some URLs get invalid XML returned. Interestingly one URL I've discovered that causes this error is http://en.wikipedia.org/wiki/God.

On closer inspection it appears the content near the end of the response is the issue. I've changed the XML at the end of the response from:

 </body></results></query>>!-- Stotal: 288-->
<!-- Lengine9.yql.ac4.yhouocolm-->
<

to

</body></results></query><!-- Stotal: 288-->
<!-- Lengine9.yql.ac4.yhouocolm-->

and the XML becomes valid once more. This seems like a bug with the YQL response. Anybody know how to avoid this?

NOTE: This issue doesn't seem to happen if I'm in diagnostics mode (&diagnostics=true) but that's not really an ideal solution.

by
0 Replies

Recent Posts

in YQL