Max regex input length of 17500 exceeded

I'm getting a lot of blank articles from the following two Pipes: http://pipes.yahoo.com/pipes/pipe.info?_id=60789ce98d18de7f2f9a63d03a36c60a http://pipes.yahoo.com/pipes/pipe.info?_id=5be9b28057f1e98024cdad1bcc77b862

There are a couple of cases where the structure of the page means my XPath query is wrong but in the vast majority of cases, the XPath is correct.

I assume the reason is due to the error message of Max regex input length of 17500 exceeded however I don't know a way to fix this. I'm using the same format for a lot of other pipes with no problems at all, it's just Gizmodo and Lifehacker that are causing these issues.

I've tried removing all of the regex expressions apart from the first one (which replaces the description text) to no avail.

Any help greatly appreciated.

4 Replies
  • Think I've fixed this by changing the regex line that replaces the description with the new content. I changed item.description to item.description.content and this appears to have fixed it.

    Strange that all my other pipes didn't have the same problem.

  • item.description.content

    In the items I looked at...

    • item.description existed for all of them.
    • item.description.content didn't exist.

    If you're lucky - to avoid the error - you can use String Tokenizer in a loop. The entire thing is too big but you can break it up.

    Here's a lucky example:

    <div class="article">the first part of the article</div><div class="ad">annoying</div><div class="article">the second part of the article</div>

    You tokenize at <div class="ad"> and say you name the output item.STUFF

    • item.STUFF.0 will be the first part
    • item.STUFF.0 will be the broken ad div and the second part

    Or the annoying thing could be anything (social div etc) that conveniently always appears in the middle of the article.

  • I can't edit my first post.

    Darn it. I misunderstood what you're doing.

    I think you can use regex first on the "blobs" to be sure they are as small as possible. And then create your description.

    If the blobs are truly gigantic - you might be able to use the string tokenizer trick.

  • Hmmm, not sure I did fix the error as I just reran the cloned pipe that I had edited and I still got it. That explains why I was still getting blank articles.

    I guess this is just a quirk of the Gawker sites and is a bit frustrating as they've only just turned-off their full-text feed. I did find a great service at fullrss.net which uses a centralised database of XPath information for sites. This seems to be working perfectly for Gizmodo and Lifehacker so I'll stick with that.


Recent Posts

in Pipes