0

String Operations and Relative URLs

Executing a query like the following:

CODEBOX
select content from html where url in (select href from html where url='http://lists.nyphp.org/pipermail/talk/' and xpath='//html/body/table/tr[.]/td[2]/a[4]')


Results in a result set full or errors like Bad host 2009-September because the URLs are relative.

1. Seems as though YQL should mimic the behavior or browsers and thus correctly recognize relative URLs.

2. Seeing it apparently doesn't at this time, how would I manipulate the returned href from the inner query so that a hostname is appended?

3. Ideal, I could pass in a pre-populated variable that relates to the parent URL of the href, but even if I had to prepend a URL manually, how would I concat two strings?

Thanks - exciting stuff.

by
3 Replies
  • QUOTE (Hans Z @ Oct 10 2009, 06:50 PM) <{POST_SNAPBACK}>
    Executing a query like the following:

    CODE
    select content from html where url in (select href from html where url='http://lists.nyphp.org/pipermail/talk/' and xpath='//html/body/table/tr[.]/td[2]/a[4]')


    Results in a result set full or errors like Bad host 2009-September because the URLs are relative.

    1. Seems as though YQL should mimic the behavior or browsers and thus correctly recognize relative URLs.

    2. Seeing it apparently doesn't at this time, how would I manipulate the returned href from the inner query so that a hostname is appended?

    3. Ideal, I could pass in a pre-populated variable that relates to the parent URL of the href, but even if I had to prepend a URL manually, how would I concat two strings?

    Thanks - exciting stuff.


    Hmm, still no luck in figuring out a workaround for this - anyone have any thoughts? Seems as though this would severely limit the usefulness of YQL unless I'm totally missing something.

    Best,

    H
    0
  • Hey Hans,

    I was thinking you could use regex to massage those urls, but currently the regex table (I wrote that one) doesn't support replacement.

    I'm working on revving that table.
    0
  • You can try this now:

    CODE
    use "http://kid666.com/yql/regex.xml" as regex2; select * from regex2 where expression = "(.*)" and text in (select href from html where url='http://lists.nyphp.org/pipermail/talk/' and xpath='//html/body/table/tr[.]/td[2]/a[4]') and replacement = "http://lists.nyphp.org/pipermail/talk/$1";


    While I get the regex table in the community tables working. Full query could be:

    CODE
    use "http://kid666.com/yql/regex.xml" as regex2; select * from html where url in (select match0 from regex2 where expression = "(.*)" and text in (select href from html where url='http://lists.nyphp.org/pipermail/talk/' and xpath='//html/body/table/tr[.]/td[2]/a[4]') and replacement = "http://lists.nyphp.org/pipermail/talk/$1")


    Just watch out for the 30 second execution limit when making these kinds of calls.
    0

Recent Posts

in YQL