Varnish crashing when system starts to swap
Calle Korjus
calle at korjus.se
Thu Apr 17 15:03:39 CEST 2008
We have an environment that serves lots of small dynamicly backend generated image files. The total dataset is about 2TB but we're not looking to cache all of it, just ease the load on the backend machines. We have about 2000-2500 hits/s in total today and we are running 3 apaches with mod_caucho as frontends.
We have installed varnish on the same servers as the apache frontends and configured them to use the local apache as backend. The machines are dual opterons with dualcore so 4 cores per server with 16GB of ram and we're running rhel 4.2.
This is our varnish setup:
user varnish (201)
group varnish (201)
default_ttl 3600 [seconds]
thread_pools 1 [pools]
thread_pool_max 1000 [threads]
thread_pool_min 128 [threads]
thread_pool_timeout 60 [seconds]
overflow_max 100 [%]
rush_exponent 3 [requests per request]
sess_workspace 8192 [bytes]
obj_workspace 8192 [bytes]
sess_timeout 5 [seconds]
pipe_timeout 60 [seconds]
send_timeout 600 [seconds]
auto_restart on [bool]
fetch_chunksize 128 [kilobytes]
vcl_trace off [bool]
listen_address ":80"
listen_depth 1024 [connections]
srcaddr_hash 1049 [buckets]
srcaddr_ttl 30 [seconds]
backend_http11 off [bool]
client_http11 off [bool]
cli_timeout 5 [seconds]
ping_interval 3 [seconds]
lru_interval 3600 [seconds]
cc_command exec cc -fpic -shared -Wl,-x -o %o %s
max_restarts 4 [restarts]
max_esi_includes 5 [restarts]
cache_vbe_conns off [bool]
cli_buffer 8192 [bytes]
diag_bitmap 0x0 [bitmap]
This is our startup command:
/opt/varnish/sbin/varnishd -a :80 -p lru_interval 3600 -f /opt/varnish/conf/default.vcl -T 127.0.0.1:6082 -t 3600 -w 128,1000,60 -u varnish -g varnish -s file,/srv/varnish/varnish_storage.bin,30G -P /var/run/varnish.pid
Varnish looks fine until it's had abour 1,5 million requests, then we can see the kswapd0 and kswapd1 start working and load average rises to about 200 and the machine gets totally unresponsive. Top shows a lot of cpu beeing spent on i/o waits and varnish child process restarts sometimes. In best case the process restarts and the server starts behaving within 5 minutes but sometimes varnish dies completely. One thing we have noticed is that the reserved memory for varnish keeps rising and when it crashes it is usually around 14G.
The varnish storage file is running on the same physical disk as the system and the swap, could that be the problem? Should varnish really allocate so much memory so that the system starts to swap to disk?
Any suggestions or comments are welcome.
Regards
Calle Korjus
More information about the varnish-misc
mailing list