Our experience with the persistent branch

Thu Oct 24 00:06:26 CEST 2013

Hi there!

I thought It would be useful for some people out there to mention a
"case of successful" on the Persistent branch of Varnish Cache on a
large production scale.
Ours is the replacement of the so-long-time-appreciated Squid Cache for
delivering (mostly) user-provided content, like Images and other kinds
of documents, but static objects like Javascripts and style sheets
belonging to our platform, OLX, as well.

- TL;DR Version
The development around the persistent branch rocks, *big time*!

- Full Version
//  *Disclaimer*
//  Our configuration is not what one could call "conventional", and we
can't really afford loosing large amounts of cache in short periods of time.

- The Business

Our company, OnLine eXchange or simply OLX for short, has established
its core around the Free Classifieds business, enabling Sellers and
Buyers to perform peer-to-peer selling transactions with no commission
required to any parts. Our portal allows sellers to upload images for
better describing their items into our platform and being more
attractive to potential buyers and having better chances of selling
them. Just like Craiglist does in the US or Allegro in Poland.

Although some particular countries are more popular than others, we have
a large amount of traffic world-wide, due to an extensive net of
operations and localization.
To sum up, this translates into having to tune every single piece of
software we run out there for literally delivering several gbps of
traffic out to the internet.

- Massive Delivering

Even though we heavily rest on content delivery networks (CDNs) for
performing last-mile optimizations, geo-caching and network optimized
delivery, they have limits on what they can handle and they also run
caching rotation algorithms over objects as well (say, something like
LRU). The do all the time, more often that you do actually want.
On the other hand, because of their geo-distribution nature, in reality
several request are performed from different locations before they
effectively cache objects for some small period of time.

It might be true that we are on average offloaded +80% of the traffic,
but again, in reality even that small portion has average 400 mbits/s,
with frequent 500mbps spikes in business-as-usual levels.

- Ancient History

We used to run this internal caching tier with Squid Cache backing the
cache up, with the help of tons of optimizations and an extremely
experimental backend called Cyclic Object storage system or COSS. This
very very experimental backend made use of a very granular storage
configurations to get the most out of every IOP we could save to our
loaded storage backends.
Due to its immaturity, some particular operations sucked, like a *30
minutes downtime* per instance by restarting the squid engine because of
COSS data rebuild & consistency checks.

We also rested very heavily on a sibling caching relation ship
configuration by using HTCP, on a way to maximize the caching
availability and saving some objects to get to the origin, which, as I
said, was heavily loaded almost all the time.
Squid is a great tool and believe me when I say, we loved it with all
the downsides it had, we have had it for several years up to very
recently, we knew it backwards and forwards until then.

- Modern History

Things started to change for the worse recently when our
long-standing-loaded storage backend tier, featuring a vendor supposedly
standing at the Storage Top Five ranking, could not face our
business-as-usual operations any longer.
Performance started to fall apart in pieces once we started to hit +95%
on this well-known storage provider's solution CPU usage, unable to
serve objects in form and shape once 95% was reached.
We were forced to start diving into new and radical alternatives on a
very short term.

We have been using Varnish internally since some years now for boosting
our SOLR backend and some other HTTP caching roles, but not for
delivering static content up to that moment.
Now was the time to give it a chance, "for real" (we have several gbps
of traffic internally too, but you know what I mean).

- First Steps

We started by using the same malloc backend configurations as we were
using on this other areas were Varnish was deployed, with some
performance tunes around sessions (VARNISH_SESSION_LINGER) and threads
capacity (VARNISH_MAX_THREADS &&VARNISH_THREAD_POOLS &&
VARNISH_THREAD_TIMEOUT).

The server profiles handling this tasks were some huge boxes with 128M
RAM, but since the 50-ish terabytes dataset didn't actually fit on RAM,
once all the memory was allocated we started to suffer same random
panics at random periods of time. Unable to replicate them on demand or
to produce any debugging due to the amount of traffic this huge devils
handled on production, things started to get really ugly really fast.

- The Difficult Choice

At this point we started to consider every possible option available out
there, even switching away to other caching alternatives, like
trafficserver, that provided persistence for cached objects, and we
decided to give the persistent backend a shoot, but, there was a catch:
the persistent backend was (and also currently is) considered
experimental (so did COSS, btw!).

Effectively, as for what 3.0.4 release on the main stable branch
respects, the persistent backend had many bugs that raised up within
minutes, many of them produced panics that crashed the child process,
but, fortunately for us, the persistence itself worked so well that the
first times went through totally unnoticed, which, compared with
experiencing a 100mb cache melt down was just an amazing improvement by
itself. Now things started to look better for Varnish.
Of course, we learned that in fact we were loosing some cached objects
by some broken silos in the way, but, doing a side-by-side comparison
things looked different as night and day, and the best was yet to come.

We were advised, in case we were to stick to a persistent backend, on
using the persistent development branch, giving that more improvements
were developed in there and major stability changes were introduced.
But, think again, proposing something that has the "experimental" and
"on development" tags hanging from it usually sells horribly to the
management people on the other side of the table.

- Summary

At the end, with the help of a hash-based balancing algorithm at the
load balancing tier in front of our Varnish caches, we were able to
*almost cut half* of the CPU usage at our storage solution tier, that is
by geting 60% insead of +90% of CPU usage, something similar to a sunny
day for a walk through the park, even for serving the +2.000
request/second arriving at our datacenters.

We got there by offloading content on a *up to +70% cache hits*,
something that was totally unconceivable for anyone at the very
beginning of the migration, giving that we used to get less than 30% in
the past with Squid.

We were able to get up to this point with lots of patience and research,
but particularly with the help of the Varnish core development that
constantly supported us at the IRC channels and mailing lists.

Thanks a lot guys, you and Varnish rock, big time!

A happy user!

Dererk

-- 
BOFH excuse #274:
It was OK before you touched it.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20131023/dd2e70d2/attachment.pgp>