Strange bug since upgrading to varnish 4

Hugues Alary hugues at betabrand.com
Sat Nov 15 03:04:52 CET 2014


Hi Lasse,

I finally had some time to do some more debugging.

I tried many different setting, and I still run into the same bug. In order
for you to reproduce it as easily and as fast as possible please find the
following gist https://gist.github.com/huguesalary/fc59034d3a49745755f4
which contain:

- information about the machine running varnish
- the configuration snippet
- a command line triggering the bug 100% of the time on my debian servers
and my docker containers (that are hosted on a debian server too)
- a syslog output
- a dmesg output
- the *full* output from varnishlog.

I successfully reproduced the bug with the default varnish setting, but
also with really high values for these settings:

pcre_match_limit=1000000
pcre_match_limit_recursion=1000000
workspace_client=50M

If you'd like I can file a bug report in the tracker.

Thank you for your help.

-Hugues

On Thu, Nov 13, 2014 at 9:03 AM, Hugues Alary <hugues at betabrand.com> wrote:

> Hi Lasse,
>
> Thanks for you answer. I hadn't thought of looking in syslog, here's what
> I found:
>
> Nov 13 04:12:55 varnishd[24316]: Child (26394) died signal=11
> Nov 13 04:12:55 varnishd[24316]: Child cleanup complete
> Nov 13 04:12:55 varnishd[24316]: child (26614) Started
> Nov 13 04:12:55 varnishd[24316]: Child (26614) said Child starts
> Nov 13 04:12:59 kernel: varnishd[26830]: segfault at 6543feb43f00 ip
> 0000654412dd27ed sp 00006543feb43ec0 error 6 in
> libpcre.so.3.13.1[654412dbf000+3c000]
> Nov 13 04:12:59 kernel: grsec: From 58.7.84.48: Segmentation fault
> occurred at 00006543feb43f00 in /usr/sbin/varnishd[varnishd:26830]
> uid/euid:65534/65534 gid/egid:65534/65534, parent
> /usr/sbin/varnishd[varnishd:24316] uid/euid:
> 0/0 gid/egid:0/0
>
> libpcre? Sounds promising.
>
> I'll try to increment the prce_match_limit and report back.
>
> Side note: on my mac, I can reproduce the bug, but not consistently. I
> noticed in my logs these lines:
>
> 11/13/14 8:58:13.507 AM varnishd[12948]: Child (12958) said HASHMAGIC:
> <47bf313ea1f6e04499e1751a3b9ca878bb51faaaf5e2128eeed6ef5b465a8e6c> ->
> <000000000000000000000000000000000000000000000000000000HASHMAGIC:
> <003c006a00b100dc02a0>
> 11/13/14 8:58:13.507 AM varnishd[12948]: Child (12958) said
> 68ab0cfdc1acf821774561a994e85995e9527eae9720da768d1066> ->
> <0000000000000000000000000000000000000000000000000000000000000040>
> 11/13/14 8:58:13.508 AM varnishd[12948]: Child (12958) said HASHMAGIC:
> <b365fe539a4392542d968e3a3256e9feac46227167564b320fe5f21c1b5467d5> ->
> <0000000000000000000000000000000000000000000000000000000000000080>
> 11/13/14 8:58:13.515 AM varnishd[12948]: Child (12958) said HASHMAGIC:
> <1908209a8bdd9ae3690cd3c7937000311dd0df8546d152bf5c683ef17be0f086> ->
> <0100000000000000000000000000000000000000000000000000000000000000>
> 11/13/14 8:58:13.516 AM varnishd[12948]: Child (12958) said HASHMAGIC:
> <92bf1ef1dc244bdf6e8b09b06576c4de2888f9c1aebba2270911e4c20a0ce569> ->
> <0200000000000000000000000000000000000000000000000000000000000000>
> 11/13/14 8:58:13.558 AM varnishd[12948]: Child (12958) said HASHMAGIC:
> <43d7cc8e4b069561e6336eebca63b9766667f952648003d0ca35211d9ec07e55> ->
> <8000000000000000000000000000000000000000000000000000000000000000>
> 11/13/14 8:58:13.563 AM varnishd[12948]: Child (12958) said HASHMAGIC:
> <33bcb999060165791f7b08a0d9cc25bca64e737ed4faea19ecbe1fdb955a0d4d> ->
> <4000000000000000000000000000000000000000000000000000000000000000>
>
> Is this varnish somehow degrading gracefully when the bug occurs? (which
> would explain why I can reproduce is 100% of the time)
>
>
>
> On Thu, Nov 13, 2014 at 6:31 AM, Lasse Karstensen <
> lkarsten at varnish-software.com> wrote:
>
>> On Thu, Nov 13, 2014 at 01:37:48AM -0800, Hugues Alary wrote:
>> > I just upgraded to from 3 (.0.5/6) to 4.0.2 on my debian system.
>> > I converted my old varnish 3 code to 4.0.2. The varnish 3 code has been
>> > working flawlessly for more than 2 years.
>> > Now, in my varnish 4 VCL, everything works fine except for a very
>> specific
>> > piece of code that seem to crash(?) varnish. I say crash with a question
>> > mark as I haven't been able to actually confirm it crashes.
>> > The symptom of the bug is the following: when I connect with a mobile
>> > browser, the connection seems like it's being reset, and the browser
>> shows:
>> [..]
>> > In varnishadm panic.show tells me that no panic happened recently.
>> > My guess is that my condition is so long that Varnish can't handle it,
>>
>> Hi.
>>
>> I pieced together your VCL as best I could (your email client/the list
>> software line wrapped some regexes), but I'm not able to reproduce your
>> problem with 4.0.2 on debian.
>>
>> A bit of a long shot, but are there any mentions of Varnish 4 segfaults
>> in dmesg?
>>
>> My guess would be that you either are running out of workspace_client or
>> or are hitting pcre_match_limit. I suggest you increase them and see if
>> the problem persists.
>>
>>
>> --
>> Lasse Karstensen
>> Varnish Software AS
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20141114/d2b3555c/attachment.html>


More information about the varnish-misc mailing list