Caching Modified URLs by Varnish instead of the original requested URL

Guillaume Quintard guillaume.quintard at gmail.com
Fri Sep 1 14:58:03 UTC 2023


Thank you so much Geoff for that very useful knowledge dump!

Good call out on the .*, I realized I carried them over too, when I
copy-pasted the regex from the pure vcl example (where it's needed) to the
vmod one.

And so, just to be clear about it:
- vmod-re is based on libpcre2
- vmod-re2 is based on libre2
Correct?

I see no way I'm going to misremember that, at all :-D

-- 
Guillaume Quintard


On Fri, Sep 1, 2023 at 7:47 AM Geoff Simmons <geoff at uplex.de> wrote:

> Sorry, I get nerdy about this subject and can't help following up.
>
> I said:
>
> > - pcre2 regex matching is generally faster than re2 matching. The point
> > of re2 regexen is that matches won't go into catastrophic backtracking
> > on pathological cases.
>
> Should have mentioned that pcre2 is even better at subexpression
> capture, which is what the OP's question is all about.
>
> > sub vcl_init {
> >      new query_pattern = re.regex(".*(q=)(.*?)(\&|$).*");
> > }
>
> OMG no. Like this please:
>
>         new query_pattern = re.regex("\b(q=)(.*?)(?:\&|$)");
>
> I have sent an example of a pcre regex with .* (two of them!) to a
> public mailing list, for which I will burn in hell.
>
> To match a name-value pair in a cookie, use a regex with \b for 'word
> boundary' in front of the name. That way it will match either at the
> beginning of the Cookie value, or following an ampersand.
>
> And ?: tells pcre not to bother capturing the last expression in
> parentheses (they're just for grouping).
>
> Avoid .* in pcre regexen if you possibly can. You can, almost always.
>
> With .* at the beginning, the pcre matcher searches all the way to the
> end of the string, and then backtracks all the way back, looking for the
> first letter to match. In this case 'q', and it will stop and search and
> backtrack at any other 'q' that it may find while working backwards.
>
> pcre2 fortunately has an optimization that ignores a trailing .* if it
> has found a match up until there, so that it doesn't busily match the
> dot against every character left in the string. So this time .* does no
> harm, but it's superfluous, and violates the golden rule of pcre: avoid
> .* if at all possible.
>
> Incidentally, this is an area where re2 does have an advantage over
> pcre2. The efficiency of pcre2 matching depends crucially on how you
> write the regex, because details like \b instead of .* give it hints for
> pruning the search. While re2 matching usually isn't as fast as pcre2
> matching against well-written patterns, re2 doesn't depend so much on
> that sort of thing.
>
>
> OK I can chill now,
> Geoff
> --
> ** * * UPLEX - Nils Goroll Systemoptimierung
>
> Scheffelstraße 32
> 22301 Hamburg
>
> Tel +49 40 2880 5731
> Mob +49 176 636 90917
> Fax +49 40 42949753
>
> http://uplex.de
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20230901/cb750a91/attachment.html>


More information about the varnish-misc mailing list