Varnish restarting sporadically... losing entire cache...

Chris Hecker checker at d6.com
Fri Jun 25 04:42:14 CEST 2010


uname -m
i386 or i686 then it is 32 bit
x86_64 is 64bit
from 
http://www.linuxforums.org/forum/linux-newbie/72026-how-tell-if-32-64-bit-linux-machine-i-am-accessing.html

I'm also on 32 bit centos, and hoping varnish will work.

Chris


Ben Nowacky wrote:
> NOpe, this is a dedicated server.. We're running CentOs... How do you know we're running 32-bit version? I had to compile from source on CentOS, so just grabbed the binaries from the site and did a build from them.  How are you guessing it's 32-bit? 
> 
> Definitely not familiar with analyzing core-dumps or even getting them to run... I'm not a sys-admin, just the guy stuck trying to get our servers ready for an onslaught of traffic coming next week that I know we can not handle right now.... 
> On Jun 24, 2010, at 7:35 PM, Kristian Lyngstøl wrote:
> 
>> If it's not the vm you will have to turn on core dumps to figure it
>> out. That involves setting ulimit -c unlimited in the startup script
>> (or running it manually on the shell you start varnish from). You also
>> likely want to set /proc/sys/vm/core_pattern to a path where you can
>> both fit the core dump and actually find it. If you're unfamiliar with
>> analyzing core dumps, you can gzip it and send it to me along with
>> your varnish binaries, if you want to.
>>
>> As for logging, I suppose it might have changed in Ubuntu. I'll have
>> to check that. You got the assert error though, so it's all there.
>>
>> Just out of curiosity though: why 32-bit? Is it by any chance a
>> virtual machine, or similar?
>>
>> -Kristian
>> PS: I'm not on a computer right now, so you will want to verify the
>> ulimit argument-name and core_pattern path.
>>
>> 2010/6/25, Ben Nowacky <bnowacky at competitorgroup.com>:
>>> Thanks Kristian! Been reading your blog, and got some of these from your
>>> site... Guess I went overboard with some of them...
>>>
>>> - Ther is no /var/log/syslog so nothing else is being logged. This is the
>>> only location i've been able to get any debug info out of varnish. We're not
>>> tapping out VM or anything else it appears though.. Everything looks okay on
>>> that front, but I'm going to lower the max threads and see how that takes
>>> us.. maybe it'll be a simple solution.
>>>
>>> Appreciate the help!
>>> On Jun 24, 2010, at 7:00 PM, Kristian Lyngstøl wrote:
>>>
>>>> As Per says, it's likely you run out of vm space. You are also
>>>> specifying a great deal of parameters which I suspect are not actually
>>>> adjusted to your site. I would not recommend half of them unless you
>>>> actually know why.
>>>>
>>>> It looks like your log entries are from /var/log/messages. You will
>>>> likely find more in /var/log/syslog on Ubuntu.
>>>>
>>>> Also: 5000 threads is going to be far too many on a 32-bit system.
>>>> Using 64-bit is by far the simplest way to avoid hassel. If you insist
>>>> on 32-bit, you will need to reduce the maximum amount of threads, and
>>>> possibly adjust the stack size, though newer varnish packages might
>>>> try to do the latter. At any rate, closely monitor vm-usage.
>>>>
>>>> Also, signal 11 is a segfault. This means invalid or illegal memory
>>>> access, which could match the symptoms of a 32-bit
>>>> varnish-installation running out of virtual memory address space.
>>>>
>>>> - Kristian
>>>>
>>>> 2010/6/25, Ben Nowacky <bnowacky at competitorgroup.com>:
>>>>> Here's the error I get consistently:
>>>>> Jun 24 23:35:31 srv860 varnishd[20605]: Child (21427) died signal=11
>>>>> Jun 24 23:35:31 srv860 varnishd[20605]: child (21660) Started
>>>>> Jun 24 23:35:31 srv860 varnishd[20605]: Child (21660) said
>>>>> Jun 24 23:35:31 srv860 varnishd[20605]: Child (21660) said Child starts
>>>>>
>>>>> Here's my config:
>>>>> "-f /usr/local/varnish-2.1.2/etc/default.vcl \
>>>>> 	     -s malloc,1G \
>>>>> 	     -p thread_pool_max=5000 \
>>>>> 	     -p thread_pools=4 \
>>>>> 	     -p thread_pool_min=200 \
>>>>> 	     -p thread_pool_add_delay=1ms \
>>>>> 	     -p cli_timeout=1000s \
>>>>> 	     -p ping_interval=1 \
>>>>> 	     -p cli_buffer=16384 \
>>>>> 	     -p session_linger=20ms \
>>>>> 	     -p lru_interval=360s \
>>>>> 	     -p listen_depth=8192 \
>>>>>        -h classic,500009 \
>>>>> 	     -T localhost:2000 "
>>>>>
>>>>> Am I doing anything in here atrocious that would be causing the random
>>>>> resets? I've tried file and malloc storage to no avail.. Neither one
>>>>> fixed
>>>>> the issue. I've tried adjusting sess_timeout, sess_workspace, etc... also
>>>>> nothing..  Changed the hash from classic to critbit also, with no
>>>>> success.
>>>>> Bashing head against the wall, if anyone has any advice could really use
>>>>> it
>>>>> ! !
>>>>>
>>>>>
>>>>> On Jun 24, 2010, at 10:58 AM, Caunter, Stefan wrote:
>>>>>
>>>>>> Check dmesg too, child is probably dying. Problem with persistent I
>>>>>> found, I had to go back to file.
>>>>>>
>>>>>> Stefan Caunter :: Senior Systems Administrator :: TOPS
>>>>>> e: scaunter at topscms.com  ::  m: (416) 561-4871
>>>>>> www.thestar.com www.topscms.com
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: varnish-misc-bounces at varnish-cache.org
>>>>>> [mailto:varnish-misc-bounces at varnish-cache.org] On Behalf Of Ben Nowacky
>>>>>> Sent: June-24-10 1:51 PM
>>>>>> To: Flavio Torres
>>>>>> Cc: varnish-misc at varnish-cache.org
>>>>>> Subject: Re: Varnish restarting sporadically... losing entire cache...
>>>>>>
>>>>>> Thanks Flavio! Here's the errors that I see in the /var/log/messages...
>>>>>> Is this what you were seeing?
>>>>>>
>>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22165) Panic message:
>>>>>> Assert error in SMP_FreeObj(), storage_persistent.c line 802:
>>>>>> Condition(sg->nfixed > 0) not true. thread = (cache-timeout) ident =
>>>>>> Linux,2.6.18-128.4.1.el5PAE,i686,-spersistent,-hclassic,epoll Backtrace:
>>>>>> 0x806ca7c: pan_ic+cc   0x808851e: SMP_FreeObj+13e   0x8064b5f:
>>>>>> HSH_Deref+21f   0x80618d1: exp_timer+321   0x806f1fd: wrk_bgthread+cd
>>>>>> 0x44249b: /lib/libpthread.so.0 [0x44249b]   0x39942e:
>>>>>> /lib/libc.so.6(clone+0x5e) [0x39942e]
>>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: child (22984) Started
>>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22984) said
>>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22984) said Child starts
>>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22984) said Dropped 0
>>>>>> segments to make free_reserve
>>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22984) said Silo
>>>>>> completely loaded
>>>>>> On Jun 24, 2010, at 10:51 AM, Flavio Torres wrote:
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> varnish-misc mailing list
>>>>> varnish-misc at varnish-cache.org
>>>>> http://lists.varnish-cache.org/mailman/listinfo/varnish-misc
>>>>>
>>>
> 
> 
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> http://lists.varnish-cache.org/mailman/listinfo/varnish-misc
> 



More information about the varnish-misc mailing list