Varnish child killed

Geoff Simmons geoff at uplex.de
Thu Apr 21 13:42:03 CEST 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 4/21/11 10:51 AM, Jean-Francois Laurens wrote:
> 
> We’re run varnish 2.1.5 for some week now and we still do not understand
> some behavior regarding the shared memory activity.

There's not enough information here for anything better than guesses
about what's going on.

> We specified a –sfile,/var/lib/varnish/varnish_storage.bin,50G in the
> configuration but it’s impossible to go higher than 25G used by varnish.
[...]
> 
> In addition I can see varnish doesn’t seem to be able to handle more
> than 1 million objects:

It's not uncommon for Varnish to use significantly less memory than what
was allocated, but not because Varnish can't "handle" it, but just
because it works out that way. Due to a combination of factors like
usage patterns, TTLs, your command line settings and your VCL, Varnish
may decide that it doesn't need more than that.

What do your cache hit ratios say? Do the logs or varnishstat give any
indication that objects are not being cached when you think they should
be? Do you have objects that, semantically, could be cached, but aren't
because, for example, they are unnecessarily setting cookies? You might
be able to get more into the cache more by tweaking VCL, but as I said,
that's just a guess.

> When the child process get killed, the load of the system was very high:
> Apr 20 21:46:44 server-01-39 varnishd[21087]: Child (5372) not
> responding to CLI, killing it.
> ....
> Apr 20 21:49:57 server-01-39 nrpe[18101]: Command completed with return
> code 2 and output: CRITICAL -*load average: 159.00, 159.32,
> 77.02*|load1=159.000;15.000;30.000;0; load5=159.320;10.000;25.000;0;
> load15=77.020;5.000;20.000;0;
> ....
> Apr 20 21:48:43 server-01-39 varnishd[21087]: Child (5372) not
> responding to CLI, killing it.

It looks like the message about high load came after the Varnish
processes died, and that might have happened, at least in part, because
Varnish was restarted and was getting nothing but cache misses. Unless
the high load was caused by something else. Which processes were showing
the highest CPU usage?

The real question is why the Varnish child was no longer responding to
pings. Do you have any panic messages from Varnish in your syslog, or
anything else indicating the error? If the load was that high *before*
the processes died, your system might have been under so much stress
that the child processes just couldn't answer pings in time. In which
case your real problem might be something other than Varnish.

> All this makes me believe we have an issue with some kernel parameters
> that do not allow varnish to handle as many objects as we configured it.

It could be that, it could be another process that was causing heavy
load, it could be your VCL or your command line settings. Too many open
questions here.


Best,
Geoff
- -- 
UPLEX Systemoptimierung
Schwanenwik 24
22087 Hamburg
http://uplex.de/
Mob: +49-176-63690917
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBCAAGBQJNsBgKAAoJEOUwvh9pJNURzS0QAJOvVWr3Yi4DsA2x0Ck+/HTa
pkL69dRhUskq5Ll6Ny+e0DBB9I3Dx48ZT9ZxzRcvIZQn4shPl1GPdQQRHCB0ek82
o8lLCdS/ta2HZhQI96FSUBj5RYDrPd3B78cAlvDLYzHsZIUbg90WmizHE/x9vPOi
z5TOS/0S3Ao7JIuqkMpkWYyVs4AH6aKIX1L9er9jYLbHp5s8R2ilzs3USeLdC8Kl
spGAaSn4mcCVHmhR+ZQ2XQjaf2nxN7oXEIviGOZOWfZ1XX1hQpDtjhp1D9BoInBW
oNZmamt6Hd+m00LCu88YhTiBMRDD7zbom9C0NWLf6n7LaCIQteM/KEo1z9tPLAS6
qmQzv+EvBKG5Dpcp81v5TqiUyVDzsYFegoKR6FKCCXvTlCI6avBlik1AlXRhecsF
27da7zMVvoDC44Wo+zqRkwMrtzpmE/Y55wdkP3YBUg/m4nzvci1VYTy3W436NfMe
ypjWJ+bQEL9erSURNVDZLl6+I/J4cdcRxPEn96/7vaoDnq9HlvSI9SbAGWj4TDhA
ksyvDB2VBGyfaVPnmPy/4CdjbDFXB5lzF2PezUhChrehKoJXeKXPNqegKV89VAo9
EH298HuxKO+xZkVMfO9g0kHdFp6VGSCU8Y+ddU2/tMhxHGMCoXOC/sdcuCHl5HRW
G6cSzXYum2Y1ootALk7U
=OksF
-----END PGP SIGNATURE-----




More information about the varnish-misc mailing list