high iowait

Wed Mar 20 14:27:58 CET 2013

On Wed, Mar 20, 2013 at 9:12 AM, Lasse Karstensen <
lkarsten at varnish-software.com> wrote:

> On Tue, Mar 19, 2013 at 09:46:37PM -0400, Sean Allen wrote:
> > One of our varrnish servers is spending about 40-50% of its time in
> iowait.
> > Is this just from the varnishlog getting written? Our IO performance is
> not
> > great and I'm looking to be able to get the amount of time we are
> spending
> > doing IO down. This occurs even when everything is running nicely in
> memory
> > and we aren't overflowing into swap ( which was detailed in a different
> > email as I think they might be different issues ).
> > varnishd (varnish-3.0.3 revision 9e6a70f)
> [..]
> > # Without "ban_lurker_sleep," nothing banned from the cache ever gets
> > evicted.
>
> Ban lurker default sleep is 10ms in 3.0.3. You are increasing it a lot,
> why?
>
>
Lack of knowledge when we put it together about what would be good.
My understanding is the less frequently that runs the more memory that
would be used
but performance would be slightly better. Is that incorrect?

> Are you using smart bans?
>
>
> https://www.varnish-software.com/static/book/Cache_invalidation.html#smart-bans

A combination of smart bans and purges.

Any invalidation that comes from the backend is handled via a smart ban.
There is legacy code
that causes db updates that would send a purge via port 6081 to invalidate
entities that were
updated 'out of band'.

The vast majority come via the 'out of band' purges.

>
>
> > DAEMON_OPTS="-a :6081 \
> >              -T :6082 \
> >              -f /etc/varnish/default.vcl \
> >              -u varnish -g varnish \
> >              -p thread_pool_min=500 \
> >              -p thread_pool_max=5000 \
> >              -s malloc,6G \
> >              -p thread_pools=2 \
> >              -p thread_pool_add_delay=1 \
> >              -p ban_lurker_sleep=60s"
>
> Your thread count is a bit on the high side (10k); ~1 thread / concurrent
> connection is good.
>

So workers created says 1000. Nothing ever gets queued. On this varnish we
dont get to 1000 concurrent.
On another we regularly bounce up 3k concurrent connections but the workers
stays at 1000.

>From the documentation, Ive never been entirely clear about how the thread
pool sizes work.

I would have thought that as our 1 second rate for connections was 3k, we
would create more workers.
Would this be requests finishing in less than 1 second so 1000 workers can
handle the 3k without queueing?
If they were going to queue, we would start to spin up more workers yes?

> But it doesn't explain why you are in iowait.
> Is this running on a virtualised server? Do you have proper network
> drivers?
>

VMWare on Centos 5.x
Drivers are good.

I'm hoping that virtualized isn't the culprit but I won't be surprised if
it is.

Thanks,
Sean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20130320/e5386cbc/attachment.html>