0

Encoding issue in the social.updates.search result set

Hi Guys, I am doing some testing on the social.updates.results

I'm getting parsing errors based in the encoding that is used in the resul tset.

Try this query in the YQL interface:

CODEBOX
select * from social.updates.search(0,100) where link = "http://twitter.com/bart/statuses/11384388744"


it will give this back as a title:
CODEBOX
<title><![CDATA[&quot;Connected Home blog » Nokia en Awox komen namens de DLNA industrie groep spreken tijden het Connected Home Event&quot; ( http://bit.ly/crMSx2 )]]></title>


Note, the » sign (right angle quote or &#187;) isn't known in utf-8, I think this should be decoded as &raquo;
I also found that signs such as ë, ï, etc fail for the same reason.

if I change the enconding to
CODEBOX
encoding="ISO-8859-1"
it works without any problems.

I don't really want to first download the content, and then modify the headers. Any chances this can be fixed shortly?

Cheers,

René

by
2 Replies
  • QUOTE (Rene @ Apr 14 2010, 05:07 AM) <{POST_SNAPBACK}>
    Hi Guys, I am doing some testing on the social.updates.results

    I'm getting parsing errors based in the encoding that is used in the resul tset.

    Try this query in the YQL interface:

    CODE
    select * from social.updates.search(0,100) where link = "http://twitter.com/bart/statuses/11384388744"


    it will give this back as a title:
    CODE
    <title><![CDATA[&quot;Connected Home blog » Nokia en Awox komen namens de DLNA industrie groep spreken tijden het Connected Home Event&quot; ( http://bit.ly/crMSx2 )]]></title>


    Note, the » sign (right angle quote or &#187;) isn't known in utf-8, I think this should be decoded as &raquo;
    I also found that signs such as ë, ï, etc fail for the same reason.

    if I change the enconding to
    CODE
    encoding="ISO-8859-1"
    it works without any problems.

    I don't really want to first download the content, and then modify the headers. Any chances this can be fixed shortly?

    Cheers,

    René


    Thanks for reporting the issue. We'll take a look and get back to you

    -- Nagesh
    0
  • So I believe that » is a valid utf8 character, and it displays perfectly correctly in a browser using utf-8 text encoding, so Im not sure where your problem is coming from. Can you provide any more details?

    &raquo; (and so on) are HTML escaping codes, not really anything to do with UTF8.

    Jonathan
    0

Recent Posts

in YQL