0

Problems using select FROM html to get all paragraph content when para contains linebreaks

I'm well aware this might be just me doing something stupid, but using the following query:

CODE
SELECT strong,content 
FROM html
WHERE
url="http://www.english-heritage.org.uk/server/show/nav.1499"
AND
xpath="/html/body/div[2]/div/div[2]/div[3]/div[2]/div[2]/div[2]/div/div/div/div/p"


(Sorry about the ugly xpath)

This behaves as expected when I test it in the YQL console and view the XML returned - the full content of each paragraph is returned, including the strong tag. However, if I look at this in Tree View (or use it in a Pipe), only the content after the final <br /> tag is returned as the "content" element. (The "strong" element is preseved)

I thought this might have been some 'summarising' in tree view, but having run it through a Pipe and checked the subsequent RSS output, it seems that everything before the final <br /> is getting dropped on the floor when it's present in XML view.

Bug? Or my ignorance? And if so, can anyone point me at the right bit of the docs so I can educate myself and fix this?

Thanks in advance to anyone who can help.

by
3 Replies
  • QUOTE (skip.chris @ Aug 19 2009, 01:34 AM) <{POST_SNAPBACK}>
    I'm well aware this might be just me doing something stupid, but using the following query:

    CODE
    SELECT strong,content 
    FROM html
    WHERE
    url="http://www.english-heritage.org.uk/server/show/nav.1499"
    AND
    xpath="/html/body/div[2]/div/div[2]/div[3]/div[2]/div[2]/div[2]/div/div/div/div/p"


    (Sorry about the ugly xpath)

    This behaves as expected when I test it in the YQL console and view the XML returned - the full content of each paragraph is returned, including the strong tag. However, if I look at this in Tree View (or use it in a Pipe), only the content after the final <br /> tag is returned as the "content" element. (The "strong" element is preseved)

    I thought this might have been some 'summarising' in tree view, but having run it through a Pipe and checked the subsequent RSS output, it seems that everything before the final <br /> is getting dropped on the floor when it's present in XML view.

    Bug? Or my ignorance? And if so, can anyone point me at the right bit of the docs so I can educate myself and fix this?

    Thanks in advance to anyone who can help.


    When talking about Pipes I believe you are referring to the YQL module. Instead of that you can use a Fetch Data module and use the URL supplied from activating the "COPY URL" button for the REST query. The path for the Fetch Data would be set to results.p.

    My guess is that your issue involves a problem YQL has with JSON output.
    0
  • QUOTE (hapdaniel @ Aug 19 2009, 05:56 AM) <{POST_SNAPBACK}>
    When talking about Pipes I believe you are referring to the YQL module. Instead of that you can use a Fetch Data module and use the URL supplied from activating the "COPY URL" button for the REST query. The path for the Fetch Data would be set to results.p.

    My guess is that your issue involves a problem YQL has with JSON output.


    Good thinking - I'll give that a whirl in Pipes and let you know how it goes.

    But... the 'issue' as I see it exists in the YQL console as well - content visible in XML view, not visible in Tree view. Anyone able to shed any light?
    0
  • QUOTE (skip.chris @ Aug 19 2009, 07:48 AM) <{POST_SNAPBACK}>
    Good thinking - I'll give that a whirl in Pipes and let you know how it goes.


    And it goes great - thanks for the advice!
    0

Recent Posts

in YQL