High read IOPS with file backend

Bryan Stillwell bstillwell at photobucket.com
Tue Aug 20 00:30:40 CEST 2013


I decided to give the persistent storage backend a try today to see if
it behaved any differently, but it still appears to be quite
experimental.  Within minutes I had varnishd restart itself a couple
times before I disabled it in the load balancer.

Here are the reasons reported in /var/log/messages:

Aug 19 16:08:47 v01 varnishd[30007]: Child (30008) not responding to
CLI, killing it.
Aug 19 16:08:47 v01 varnishd[30007]: Child (30008) died signal=6 (core dumped)
Aug 19 16:08:47 v01 varnishd[30007]: Child (30008) Panic message:
Assert error in HSH_Unbusy(), cache_hash.c line 645:#012
Condition(oc_getobj(sp->wrk, oc) == o) not true.#012thread =
(cache-worker)#012ident =
Linux,2.6.32-358.14.1.el6.x86_64,x86_64,-spersistent,-smalloc,-hcritbit,epoll#012Backtrace:#012
 0x42f728: /usr/sbin/varnishd() [0x42f728]#012  0x418175:
/usr/sbin/varnishd() [0x418175]#012  0x419195:
/usr/sbin/varnishd(CNT_Session+0x655) [0x419195]#012  0x431471:
/usr/sbin/varnishd() [0x431471]#012  0x3b89607851:
/lib64/libpthread.so.0() [0x3b89607851]#012  0x3b892e890d:
/lib64/libc.so.6(clone+0x6d) [0x3b892e890d]#012sp = 0x7ef14445a008
{#012  fd = 230, id = 230, xid = 1762400248,#012  client = XX.X.XX.XX
57031,#012  step = STP_FETCHBODY,#012  handling = deliver,#012
err_code = 200, err_reason = (null),#012  restarts = 0, esi_level =
0#012  flags =  is_gunzip#012  bodystatus = 4#012  ws = 0x7ef14445a080
{ #012    id = "sess",#012    {s,f,r,e} =
{0x7ef14445ac78,+3104,(nil),+65536},#012  },#012  http[req] = {#012
ws = 0x7ef14445a080[sess]#012      "GET",#012
"/XXXXXX/XXXX/XXXXXXXXXXXX/XXXXXXXXX.jpg",#012      "HTTP/1.1",#012
  "Cache-Control: max-age=0",#012      "Connection:
Persist,Keep-alive",#012      "Date: Mon, 19 Aug 2013 22:08:44
GMT",#012      "Via: 1.1 192.221.115.43:80 (Footprint 4.8/FPMCP)",#012
     "Accept: image/webp,*/*;q=0.8",#012      "Accept-Language:
tr-TR,tr;q=0.8,en-US;q=0.6,en;q=0.4",#012      "If-Modified-Since:
Mon, 31 Mar 2008 19:38:05 GMT",#012      "If-None-Match:
"dd7-449c0ca64e140"",#012      "Referer:
http://XXXXX.XXX/XXXXXXXXXXXXXXXXXXXX",#012      "User-Agent:
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/28.0.1500.95 Safari/537.36",#012      "TE: chunked",#012
"x-cache-request: Level3",#012      "host: i123.photobucket.com",#012
    "X-Forwarded-For: XX.XXX.XXX.XXX, XX.X.XX.XX",#012
"Accept-Encoding: gzip",#012  },#012  worker = 0x7ef10872ea90 {#012
ws = 0x7ef10872ecc8 { #012      id = "wrk",#012      {s,f,r,e}
Aug 19 16:08:47 v01 varnishd[30007]: child (31550) Started
Aug 19 16:08:47 v01 varnishd[30007]: Child (31550) said Child starts
Aug 19 16:08:47 v01 varnishd[30007]: Child (31550) said Dropped 0
segments to make free_reserve
Aug 19 16:08:47 v01 varnishd[30007]: Child (31550) said Silo completely loaded


Aug 19 16:10:14 v01 varnishd[30007]: Child (31550) not responding to
CLI, killing it.
Aug 19 16:10:14 v01 varnishd[30007]: Child (31550) died signal=6 (core dumped)
Aug 19 16:10:14 v01 varnishd[30007]: Child (31550) Panic message:
Assert error in smp_oc_getobj(), storage_persistent_silo.c line
401:#012  Condition((o)->magic == 0x32851d42) not true.#012thread =
(cache-worker)#012ident =
Linux,2.6.32-358.14.1.el6.x86_64,x86_64,-spersistent,-smalloc,-hcritbit,epoll#012Backtrace:#012
 0x42f728: /usr/sbin/varnishd() [0x42f728]#012  0x44cc2f:
/usr/sbin/varnishd() [0x44cc2f]#012  0x44cf3b: /usr/sbin/varnishd()
[0x44cf3b]#012  0x41814e: /usr/sbin/varnishd() [0x41814e]#012
0x419195: /usr/sbin/varnishd(CNT_Session+0x655) [0x419195]#012
0x431471: /usr/sbin/varnishd() [0x431471]#012  0x3b89607851:
/lib64/libpthread.so.0() [0x3b89607851]#012  0x3b892e890d:
/lib64/libc.so.6(clone+0x6d) [0x3b892e890d]#012sp = 0x7ef110279008
{#012  fd = 301, id = 301, xid = 1780069529,#012  client = XX.X.XX.XX
56677,#012  step = STP_FETCHBODY,#012  handling = deliver,#012
err_code = 200, err_reason = (null),#012  restarts = 0, esi_level =
0#012  flags =  is_gunzip#012  bodystatus = 4#012  ws = 0x7ef110279080
{ #012    id = "sess",#012    {s,f,r,e} =
{0x7ef110279c78,+848,(nil),+65536},#012  },#012  http[req] = {#012
ws = 0x7ef110279080[sess]#012      "GET",#012
"/XXXXXX/XXXXX/XXXXXXXXXX/XXXXXXXXXXX/XXXXXXXXXXXXXXXX.jpg",#012
"HTTP/1.1",#012      "Cache-Control: max-age=31536000",#012
"Connection: Persist,Keep-alive",#012      "Date: Mon, 19 Aug 2013
22:10:10 GMT",#012      "Via: 1.1 8.12.204.11:80 (Footprint
4.8/FPMCP)",#012      "Accept: image/png,image/*;q=0.8,*/*;q=0.5",#012
     "Accept-Language: en-US,en;q=0.5",#012      "Referer:
http://XXX.XXXXXXX.XXX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXX/XXXXXXXXXXXXXXXXXXXXX/",#012
     "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0)
Gecko/20100101 Firefox/23.0",#012      "TE: chunked",#012      "DNT:
1",#012      "x-cache-request: Level3",#012      "host:
i123.photobucket.com",#012      "X-Forwarded-For: XX.XXX.XXX.XXX,
XX.X.XX.XX",#012      "Accept-Encoding: gzip",#012  },#012  worker =
0x7ef0fee14a90 {#012    ws =
Aug 19 16:10:14 v01 varnishd[30007]: child (32678) Started
Aug 19 16:10:14 v01 varnishd[30007]: Child (32678) said Child starts
Aug 19 16:10:14 v01 varnishd[30007]: Child (32678) said Dropped 0
segments to make free_reserve
Aug 19 16:10:14 v01 varnishd[30007]: Child (32678) said Silo completely loaded


I would prefer to use the persistent backend, so if I can help
troubleshoot any problems with it let me know.

Bryan


On Fri, Aug 16, 2013 at 3:01 PM, Bryan Stillwell
<bstillwell at photobucket.com> wrote:
>
> Travis,
>
> Here are a couple more graphs showing the requests/sec and network
> traffic for both varnish and nginx:
>
> http://i1217.photobucket.com/albums/dd391/bstillwell_pb/Graphs/varnish-vs-nginx2.png
>
> As for testing method, that involved having both servers equally
> weighted on an F5 load balancer taking production traffic.  The F5 was
> configured to re-use connections when possible with one-connect, and
> to use persistent hashing to always send the same requests to the same
> caching server.
>
> As for the configurations for both I'll send those to you in a private email.
>
> Bryan
>
> On Fri, Aug 16, 2013 at 12:10 PM, Crowder, Travis
> <Travis.Crowder at penton.com> wrote:
> > Can you publish your testing methods and configurations for both Varnish
> > and Nginx?  Can you include the Network graphs and/or the Connections
> > graphs for Varnish and Nginx?
> >
> > -Travis
> >
> > On 8/16/13 12:29 PM, "Bryan Stillwell" <bstillwell at photobucket.com> wrote:
> >
> >>I'm looking at converting all of our nginx caching servers over to
> >>varnish, but I'm seeing some oddities with the 'file' storage backend
> >>that I'm hoping someone could shed some light on.
> >>
> >>Hardware specs for both machines:
> >>
> >>Dell PowerEdge R610
> >>  Dual Xeon E5620 @ 2.40GHz
> >>  64GiB memory
> >>  4x 240GB SSDs in a RAID5 attached to a PERC H700 RAID controller
> >>
> >>The following image compares CPU, memory, and IOPS for both a machine
> >>running varnish (left), and one running nginx (right):
> >>
> >>http://i1217.photobucket.com/albums/dd391/bstillwell_pb/Graphs/varnish-vs-
> >>nginx.png
> >>
> >>At the most recent sample, each machine was handling ~350 requests/sec.
> >>
> >>As you can see the varnish machine has a lot more CPU time dedicated to
> >>I/O wait, which matches up with ~5x higher IOPS numbers.  However, the
> >>biggest difference is that varnish is using ~25x more read IOPS than
> >>nginx.
> >>
> >>As for the jumps in the IOPS graph, I believe they can be explained by:
> >>
> >>Wednesday @ 10:00a: Started taking traffic
> >>Wednesday @ 11:00a: Memory cache filled, started using SSDs
> >>Wednesday @  4:00p: TTL of 6 hours was hit, objects start expiring
> >>Wednesday @  7:15p: SSD cache filled
> >>
> >>I pre-allocated the storage with fallocate (fallocate -l 450g
> >>/cache/varnish_storage.bin) to make sure that wasn't helping contribute
> >>to the issue.
> >>
> >>Any ideas on what could be tuned to reduce the number of read IOPS to be
> >>more inline with nginx?
> >>
> >>Thanks,
> >>Bryan
> >>
> >>_______________________________________________
> >>varnish-misc mailing list
> >>varnish-misc at varnish-cache.org
> >>https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc



More information about the varnish-misc mailing list