strange temporary varnish outage

Hu Bert revirii at googlemail.com
Mon Feb 18 09:58:24 UTC 2019


Hello,

we're using varnish v5 (debian stretch) for image caching; yesterday
there was a strange outage where i'm somehow unable to find the reason
as there are almost no log entries, besides one:

Feb 17 09:03:47 rowlf kernel: [1047133.190149] cgroup: fork rejected
by pids controller in /system.slice/varnish.service

But the problems started a couple of minutes before that, so this
message simply could be a result of previous problems. Some munin
graphs:

Backend traffic: strange spike in backend connection retry/success,
decrease in recycle/reuse:
https://abload.de/img/varnish_backend_traffqwj74.png

Expunge: a similar spike in "Number of expired objects"
https://abload.de/img/varnish_expunge-day5kk0l.png

Threads: threads went up at that time; was lower before (restart was
done on Feb 14th), and suddenly went up.
day: https://abload.de/img/varnish_threads-dayzoken.png
week: https://abload.de/img/varnish_threads-week7qjoo.png
Backend graph: https://abload.de/img/nginx_status-day54jkd.png

/etc/systemd/system/varnish.service : https://pastebin.com/aAhMHn4p
Here's the (shortened) vcl file: https://pastebin.com/nVu5vVaa

Anyone has an idea how to dig into this? Something horribly wrong in
the vcl file?


Thx,
Hubert


More information about the varnish-misc mailing list