Appending item title in brackets with the category filter tag using regex?

Hi guys,

As the title implies, I'm trying to mark each of my feed articles with the category tag it was filtered by, very much like this example in the help pages. More specifically, this line from the example:

"After filtering, each feed is then sent through its own Regex module, which we use to annotate each item title with its corresponding category. So a story titled "Overweight kids face widespread stigma (AP)" becomes "[Weight] Overweight kids face widespread stigma (AP)".

Here is the feed source for that example: http://pipes.yahoo.com/pipes/pipe.edit?_id=QMrlL_FS3BGlpwryODY80A

However, I'm trying to scale that example to an instance where there are multiple category tags being filtered for so using a regex module per category isn't practical. My understanding of these modules are very weak (even after reading the documentation), how would I scale that example such that multiple tags are being appended to article titles?

Here is my feeble attempt to make it work for my use case: http://pipes.yahoo.com/pipes/pipe.edit?_id=3a212999409e126162e7dfeeca3b9374

Clearly asking the regex module to append [item.category] with a backreference isn't working, so how should I phrase it?

Thanks for any insight.

9 Replies
  • hello,

    converting items to string or serializing items isn't pipes strong suit, hence the difficulty. Your attempt wasn't so bad, but: first, you must specify the regex mode/modifiers you use, generally s and sometime g (see here for details); second, to call the content of another mark-up in regex, use the dollar sign (same as recalling saved characters sequence but with the mark-up instead of the number) and brackets {} giving ${mySubItem}. See this pipe which is a corrected version of yours, plus the rest of this message on the right.

    Other than that, more fundamentally, if you call ${category}, you'll get only the first content. As there is no clean way to do the serialization thing (to my knowledge at least), see the loop/string builder thing on the right: it takes the content of each category sub item (at least first twelve if they exist), and append a space in-between. spaces, beginning of string and end of string are really easy to detect in a regex, allowing to append the square brackets. Then all is left to do is to append the whole string in the title.

    enjoy playing with pipes ;)

  • Thanks for the reply Lolo! The pipe looks insane!

    I clearly need to work on my creative pipe making skills as I would have never thought of using those modules. Then again, I've never heard of them either...

    At any rate, do you mean to say that by "no serialization", there's no way to annotate the item title with only the category tag it was filtered by? In other words, "[Productive] Example Article Title" or "[Memory] Example Article Title" . Strictly using the filtered keywords to append each article header.

    If this issue of "no serialization" is a byproduct of the way I've filtered the article's category tags, is there another more efficient filtering module or method I could use to reach the intended format (as listed above)?

    Hopefully you or another clever forum/staff member (Paul Donnelly) can combine powers to make something even sweeter!

    Thanks again for the help!

  • If you just want the tag(s) of the filter, it's way easier: just append by hand before the union.

  • At the risk of sounding like an even bigger noob, can you show me in that clone how you'd lay it out? I understand how to append it if I'm just specifying one tag, but I'm still not sure how to do it cleanly with multiple tags.

    Thanks again for all the help, I really appreciate it. And sorry for being so dense.

  • done, check the pipe again.

    everyone is a n00b at some point ;)

  • Beautiful, thanks so much! It's almost perfect, except for the fact that it indiscriminately slaps each article with the same set of category tags that come out of the filter modules even if they don't apply on a case by case basis, correct? In other words, not all articles that come out of the left filter module are tagged by lifehacker with both [productivity] and [memory], but the looper will annotate both regardless.

    All I dream for is that my list of 15 keyword tags filter the lifehacker articles and then tag each on a case by case basis accordingly (without having to messily open 20 filter/looper modules for each). Is this what you meant by pipe's issue with serialization? Guess there's no way to effectively resolve this then huh? :/

  • anything's possible, just need to find the right lever. but seriously, either you have to explain better, or I really need better reading/understanding skills ^^

    So, I this new light, I'd suggest to cross filter an enhanced list of keywords. Basically, you have 2 cases: the keywords of a particular post should match a set of keywords (eg, all of them) or another set (the 2 cases here being the 'all', ie 'and' and the 'or').

    I would suggest the following algorithm:

    first serialize all keywords in a single string (as I did in my first post). Then, append a new "search string" field to each of your items (loop/string builder for example) containing all of the keywords in a regex format. for example, you want all posts whose keywords match (ex1 and ex2) or ex4, you'd write ((.*ex1.*ex2.*)|(.*ex4.*)). [this is not perfect because it supposes that keyword ex1 will always be before keyword ex2, but it's not that far-fetched. check if the website have some rule here, maybe in the order of their website categories?]

    Third, filter AS REGEX the posts by the category string versus this newly built string.

    Once this filtering done (btw, no branches, all is done on the same thread, hence simple to modify), all is left is more or less string layout. Use the regex module on the category string (now useless) and filter it with the search string; put $1 in the with field. this step looks for the tag(s) used for the filtering, puts it in a memory case, and $1 calls this memory case to leave only the filtering tag(s) in the string.

    That's it. in your categories string you now have only the keywords used for the filtering, so add bracket or whatever and append it to the title.

    I won't do it for you because it was wrong of me to do it before, but I'll be happy to help you along the way!

  • That was a great depiction of the work flow and I've got it sorted out! Thanks again! I just wish YP had a module that allowed joining data streams without duplicating/mixing data and also conditional statements would have been nice. The i flag for case sensitivity seems to be slightly out of wack as well or maybe I just need to tweak the pipe a bit more.

    I'll be back if I ever need a little help, thanks so much Lolo!

  • append (?i) at the beginning of your regex for case insensitivity, it's safer ;)

    you're welcome!


Recent Posts

in Pipes