0

Extract an url from item.content with Loop and String regex

Hi I am tring to extract a url with this pipes : http://pipes.yahoo.com/pipes/pipe.info?_id=434a496c553e8b5ad8a117d38c7f8c0b

the url I want to extract is the first one encountered between

and

after the Tail box

But it doesn't work, and I don't understand much of regex...

Can you help me on that please :)

by
6 Replies
  • Hi, first, in your message, you either forgot the limits of what you want to retrieve or it did not stay in. It helps to have a complete message if you want some help ;)

    second, I had a look at your pipe, and the problem is pretty clear: the regex module tells you that the content of 'content' is too long. You feed it the whole page or so, it's not really happy with it. Either get a better cut with the fetch page module or use a simpler function first instead of the tank that is regex, like a substring in a loop operator as I did there: http://pipes.yahoo.com/luneart/a3d83624aa205c2177d603b9bfb9764a enjoy!

    0
  • Thanks for this trick Lolo ! in fact I succeeded to extract the url between the h3 tags for the first item before yesterday, then I fetched it to extract the xml file in the page of the song (containing only one item describing the song).

    But in fact, I couldn't extract the url pointing to this xml because it's between tags, and either Fetch page or XFetch could see it... So unless there is another trick here (1), I will have to build the items one by one for the output RSS I'm trying to build.

    And I thought to something else : the page contains only the last 10 favorites of the user, so if the user likes more than 10 songs in a very short time, it's very likely some songs will be missed because in the next page. And I don't know if its possible to build Pipes that would be able to detect if all the song are new, and if so, then open the next page (2). Anyway, I won't be able to create these Pipes in reasonable time, like you might guess ^^

    I thought to another solution here, which would be to build something with the soundcloud API, but it's way beyond my capacities !

    So unless there is a solution for points (1) and (2), this project of mine is dead :)

    0
  • ยง2 line 1 : I'm talking about meta* tags

    0
  • I'm going to address your (2) then (1) (so you try to play with it and master it :P) then propose a simpler solution (also, grrr.).

    (2) : I just went to the page you're trying to get your content from, and honestly I didn't even think pipes was able to fetch it as it's all javascript (a CTRL+U on firefox shows the source code of a page, here it pretty much solely amount to the <script> markup...). You would be able to check if all/most/a certain number of items are within a time-range, but I don't see how to fetch a "2nd" page (but I just dabble with pipes contrary to other users I'm sure, so maybe someone else has a solution!).

    (1) : Now, for the fetching page from with an item, it's harder but I think you can do it with the YQL module which will understand the url even if it's inputed as item.link. However I don't know this module so I con't really help you.

    Better solution: you could use the "feed auto discovery" module on your first url, to find an rss feed of the user, solving (2) and probably (1).

    Even better solution, AND SHOULD HAVE BEGUN BY DOING SO: search for soundcloud pipes... http://pipes.yahoo.com/pipes/search?q=soundcloud&x=0&y=0 As you can see the api is really simple to use, and people already did what you're trying to do. Clone the pipes, adapt it if needed.

    0
  • actually, the loop/fetch page from the url within the item should work. something wrong with the link or the page it links to maybe?

    0
  • Thanks again man ;) I look at all this later, probably not before next week and I'll keep you in touch ! cheers

    0

Recent Posts

in Pipes