Caching Modified URLs by Varnish instead of the original requested URL

Geoff Simmons geoff at uplex.de
Fri Sep 1 14:45:04 UTC 2023


Sorry, I get nerdy about this subject and can't help following up.

I said:

> - pcre2 regex matching is generally faster than re2 matching. The point 
> of re2 regexen is that matches won't go into catastrophic backtracking 
> on pathological cases.

Should have mentioned that pcre2 is even better at subexpression 
capture, which is what the OP's question is all about.

> sub vcl_init {
>      new query_pattern = re.regex(".*(q=)(.*?)(\&|$).*");
> }

OMG no. Like this please:

	new query_pattern = re.regex("\b(q=)(.*?)(?:\&|$)");

I have sent an example of a pcre regex with .* (two of them!) to a 
public mailing list, for which I will burn in hell.

To match a name-value pair in a cookie, use a regex with \b for 'word 
boundary' in front of the name. That way it will match either at the 
beginning of the Cookie value, or following an ampersand.

And ?: tells pcre not to bother capturing the last expression in 
parentheses (they're just for grouping).

Avoid .* in pcre regexen if you possibly can. You can, almost always.

With .* at the beginning, the pcre matcher searches all the way to the 
end of the string, and then backtracks all the way back, looking for the 
first letter to match. In this case 'q', and it will stop and search and 
backtrack at any other 'q' that it may find while working backwards.

pcre2 fortunately has an optimization that ignores a trailing .* if it 
has found a match up until there, so that it doesn't busily match the 
dot against every character left in the string. So this time .* does no 
harm, but it's superfluous, and violates the golden rule of pcre: avoid 
.* if at all possible.

Incidentally, this is an area where re2 does have an advantage over 
pcre2. The efficiency of pcre2 matching depends crucially on how you 
write the regex, because details like \b instead of .* give it hints for 
pruning the search. While re2 matching usually isn't as fast as pcre2 
matching against well-written patterns, re2 doesn't depend so much on 
that sort of thing.


OK I can chill now,
Geoff
-- 
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20230901/17cefae7/attachment.bin>


More information about the varnish-misc mailing list