YQL Xpath vs Chrome Xpath (with empty tags?)


I'm using YQL to track html content of a few webpages based on the xpath, but I ran into a annoying 'bug'.

example is this:


To get to the content itself, i'm using Element Inspector from Chrome, and copy the xpath to the div with class="content"


When using this in a YQL query, this is the query we need:

select * from html where url = 'http://www.scarlet.be/nl/packs/internet-tv-telefonie/' and xpath = '/html/body/div[1]/div[8]/div[1]'

Results of this query [http://y.ahoo.it/OxJhs] are empty.

I've noticed by testing that when I query to div[6], I do get the correct results. Investigating this, I found out that there are 2 empty divs (class=clear) before getting to div[8].

Is it possible that YQL ignores these empty tags and thinks it should be 6? If so, can I add a parameter in the query to ignore this and tell YQL that he should use the empty ones to, so my Xpath is correct?

It's not an option to change the input to 6. The users of this scraper are not that into web so they won't understand the issue here...

0 Replies

Recent Posts

in YQL