BAN - Memory consumption exploding under load

Hugues Alary hugues at betabrand.com
Tue Jun 19 20:54:46 UTC 2018


In case someone runs into a similar issue regarding BANs accumulating, I
found an experimental `ban_cutoff` runtime parameter (see below) that
specifically exists for this issue.

A search for `ban_cutoff` in this mailing list archive also yields a good
thread about this (thread: "Varnish Lurker is getting slower / Ban lists
keeps increasing")

ban_cutoff
        Value is: 0 [bans]
        Default is: 0
        Minimum is: 0

        Expurge long tail content from the cache to keep the number of
        bans below this value. 0 disables.

        When this parameter is set to a non-zero value, the ban lurker
        continues to work the ban list as usual top to bottom, but when
        it reaches the ban_cutoff-th ban, it treats all objects as if
        they matched a ban and expurges them from cache. As actively
        used objects get tested against the ban list at request time
        and thus are likely to be associated with bans near the top of
        the ban list, with ban_cutoff, least recently accessed objects
        (the "long tail") are removed.

        This parameter is a safety net to avoid bad response times due
        to bans being tested at lookup time. Setting a cutoff trades
        response time for cache efficiency. The recommended value is
        proportional to rate(bans_lurker_tests_tested) / n_objects
        while the ban lurker is working, which is the number of bans
        the system can sustain. The additional latency due to request
        ban testing is in the order of ban_cutoff /
        rate(bans_lurker_tests_tested). For example, for
        rate(bans_lurker_tests_tested) = 2M/s and a tolerable latency
        of 100ms, a good value for ban_cutoff may be 200K.

        NB: We do not know yet if it is a good idea to change this
        parameter, or if the default value is even sensible. Caution is
        advised, and feedback is most welcome.


On Thu, May 31, 2018 at 3:49 PM Hugues Alary <hugues at betabrand.com> wrote:

> Hi there,
>
> Re-opening this thread from 2016!
>
> A few days ago, I ran into an issue where varnish (varnish-6.0.0 revision
> a068361dff0d25a0d85cf82a6e5fdaf315e06a7d) became completely unresponsive.
>
> While investigating, I noticed a lot of errors in varnishlog `VSL: store
> overflow`, but most importantly, I noticed our BAN list was starting to
> accumulate with somewhere around 130K BANs reported by varnishstat.
>
> The load on the website was pretty low. Restarting Varnish fixed the
> issue. It is possible I had been running some script generating a lot of
> BANs, but it still seems weird it would generate so many BANs that varnish
> would become unresponsive.
>
> After restarting Varnish, I watched `varnishstat -1 | grep bans` and could
> see MAIN.bans going up and down, but staying fairly stable on average.
>
> Today, I decided to check again and I noticed:
>
> MAIN.bans                     66863          .   Count of bans
> MAIN.bans_completed           66834          .   Number of bans marked
> 'completed'
> MAIN.bans_obj                 66849          .   Number of bans using obj.*
> MAIN.bans_req                    14          .   Number of bans using req.*
> MAIN.bans_added              170939         1.18 Bans added
> MAIN.bans_deleted            104076         0.72 Bans deleted
> MAIN.bans_tested            3767852        26.08 Bans tested against
> objects (lookup)
> MAIN.bans_obj_killed          62024         0.43 Objects killed by bans
> (lookup)
> MAIN.bans_lurker_tested   3344073359     23148.62 Bans tested against
> objects (lurker)
> MAIN.bans_tests_tested      94711644       655.62 Ban tests tested against
> objects (lookup)
> MAIN.bans_lurker_tests_tested   4573910595     31661.91 Ban tests tested
> against objects (lurker)
> MAIN.bans_lurker_obj_killed          16627         0.12 Objects killed by
> bans (lurker)
> MAIN.bans_lurker_obj_killed_cutoff            0         0.00 Objects
> killed by bans for cutoff (lurker)
> MAIN.bans_dups                           113845         0.79 Bans
> superseded by other bans
> MAIN.bans_lurker_contention                   3         0.00 Lurker gave
> way for lookup
> MAIN.bans_persisted_bytes             163879330          .   Bytes used by
> the persisted ban lists
> MAIN.bans_persisted_fragmentation     162795194          .   Extra bytes
> in persisted ban lists due to fragmentation
>
>
> 1- MAIN.bans is almost at 70k and varnish has been running for 1 day 16
> hours
> 2- MAIN.bans_req is > 0, but nowhere in my configuration do I ever submit
> bans using req.*
>
> So I have 3 questions:
>
> - is 70k+ bans un-reasonable and should I find a way to reduce them?
> - why am I seeing 14 MAIN.bans_req when my config doesn't ever emit such
> bans?
>
> Thanks,
> -Hugues
>
> On Thu, Nov 3, 2016 at 3:04 AM, Dridi Boukelmoune <dridi at varni.sh> wrote:
>
>> On Thu, Oct 27, 2016 at 6:02 PM, Nils Goroll <slink at schokola.de> wrote:
>> > Hi,
>> >
>> > we've added a bunch of ban and ban lurker improvements which did not get
>> > backported to 4.1.3
>> >
>> > Please upgrade to 5.0 or master.
>>
>> Hi Nils,
>>
>> I was thinking (against the too-many-knobs trend) that maybe we should
>> have some sort of ban_queue_limit parameter to avoid piling up too
>> many bans. It would also play nice with your std.ban() initiative
>> where hitting the queue limit could be one of the reasons returned by
>> the function when it fails.
>>
>> The default could be 0 for unlimited, or a fairly large number.
>>
>> Dridi
>>
>> _______________________________________________
>> varnish-misc mailing list
>> varnish-misc at varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20180619/6aa7d90e/attachment-0001.html>


More information about the varnish-misc mailing list