I'm trying to build a pipe and I'm stuck with a YQL issue. I need a query like this:
CODEBOX
select * from html where url="some_uri" and xpath='//div[@class="some_class"]/table/tr/td[p="some_ text"]/h2/a'
The query runs fine when some_text contains just ascii characters. But it returns null results --which is wrong-- when some_text contain non ascii characters, like extended latin characters, or Greek characters which is what I need to use it with. The page where the query runs on at some_url has a utf-8 encoding and shows up just fine.
I've set up a test file at
http://mytests.atwebpages.com/test2.html duplicating the structure of the source page I'm working with.
This query returns 2 items as expected:
CODEBOX
select * from html where url="http://mytests.atwebpages.com/test2.html" and xpath='//div[@class="article-list-entry"]/table/tr/td[p="Nikos"]/h2/a'
The following two return null, whereas they should return 1 item each:
CODEBOX
select * from html where url="http://mytests.atwebpages.com/test2.html" and xpath='//div[@class="article-list-entry"]/table/tr/td[p="Γιάννης Αγιάννης"]/h2/a'
CODEBOX
select * from html where url="http://mytests.atwebpages.com/test2.html" and xpath='//div[@class="article-list-entry"]/table/tr/td[p="ÁÉÍÓÚÅÄandÖ"]/h2/a'
The former contains ascii characters in the
[p="match"], the latter two do not...
What is going wrong? Is there some limitation with non-ascii text values? Do I need to add a (convert?) function or a parameter somewhere?
I'm stuck with this for quite a while now. Thank you in advance for any help.