0

urgent!!! charset for html doesn't work anymore

As mentioned in the help guide(http://developer.yahoo.com/yql/guide/yql-l18n.html), I used to add "charset" parameter to determine the character encoding for the response body in my YQL while processing html, which have been working for a long time.
For example:
select * from html where url = 'http://sports.qq.com/a/20110401/000609.htm' and xpath='//div[@id="Cnt-Main-Article-QQ"]' and charset='gb2312'

However, I found that all the gb2312 encoded content in the results display error code(like: �) since 03/31/2011, I'm 100% sure that the charset of website doesn't change during 03/31 and reminds with "gb2312" encoded.
But the website with utf-8 charset encoded still works as usual, it looks like the results outputs with utf-8 no matter what I set in the charset parameter.

So i'm wondering there must be something changed in YQL on 03/31.
This issue is really annoying me, I appreciate if you can take a look at it ASAP.

Below is an good example for you debug:
http://query.yahooapis.com/v1/public/yql?q...iagnostics=true

by
3 Replies
  • QUOTE (javafish1 @ Apr 1 2011, 01:14 AM) <{POST_SNAPBACK}>
    As mentioned in the help guide(http://developer.yahoo.com/yql/guide/yql-l18n.html), I used to add "charset" parameter to determine the character encoding for the response body in my YQL while processing html, which have been working for a long time.
    For example:
    select * from html where url = 'http://sports.qq.com/a/20110401/000609.htm' and xpath='//div[@id="Cnt-Main-Article-QQ"]' and charset='gb2312'

    However, I found that all the gb2312 encoded content in the results display error code(like: &#65533;) since 03/31/2011, I'm 100% sure that the charset of website doesn't change during 03/31 and reminds with "gb2312" encoded.
    But the website with utf-8 charset encoded still works as usual, it looks like the results outputs with utf-8 no matter what I set in the charset parameter.

    So i'm wondering there must be something changed in YQL on 03/31.
    This issue is really annoying me, I appreciate if you can take a look at it ASAP.

    Below is an good example for you debug:
    http://query.yahooapis.com/v1/public/yql?q...iagnostics=true


    Sorry, I make a mistake in my previous post, instead, the example should be:
    select * from html where url = 'http://sports.sina.com.cn/o/2011-03-25/19105504837.shtml' and xpath='//div[@id="artibody"]' and charset='gb2312'
    0
  • QUOTE (javafish1 @ Apr 1 2011, 01:14 AM) <{POST_SNAPBACK}>
    However, I found that all the gb2312 encoded content in the results display error code(like: &#65533;) since 03/31/2011, I'm 100% sure that the charset of website doesn't change during 03/31 and reminds with "gb2312" encoded.
    But the website with utf-8 charset encoded still works as usual, it looks like the results outputs with utf-8 no matter what I set in the charset parameter.

    So i'm wondering there must be something changed in YQL on 03/31.

    select * from html where url = 'http://sports.sina.com.cn/o/2011-03-25/19105504837.shtml' and xpath='//div[@id="artibody"]' and charset='gb2312'


    Hi javafish1,

    We have a regression in the html table functionality as of 03/31 where the 'charset' parameter specified in the html table query is always defaulting to UTF-8. We have a fix for this and plan to deploy it soon.

    Thanks.
    0
  • QUOTE (Amrish @ Apr 1 2011, 01:26 PM) <{POST_SNAPBACK}>
    Hi javafish1,

    We have a regression in the html table functionality as of 03/31 where the 'charset' parameter specified in the html table query is always defaulting to UTF-8. We have a fix for this and plan to deploy it soon.

    Thanks.


    Great, many thanks for your quick response and fixing, just to let me know once you deployed it.
    :) have a nice day~
    0

Recent Posts

in YQL