[master] a92f9aa Document grace and keep, and how to deal with problematic servers

Mon Mar 12 12:50:10 UTC 2018

commit a92f9aa986bf43526316e385440412f7daba37d6
Author: Pål Hermunn Johansen <hermunn at varnish-software.com>
Date:   Mon Mar 12 13:48:00 2018 +0100

    Document grace and keep, and how to deal with problematic servers
    
    This commit comes with the realization that finding a workaround
    for #1799 is even harder than I thought.
    
    Unfortunately we cannot close issue #1799 for the reasons described
    in the documentation.
    
    Thanks to @nigoroll for valuable feedback.

diff --git a/doc/sphinx/users-guide/vcl-grace.rst b/doc/sphinx/users-guide/vcl-grace.rst
index 4b0a908..8db2a46 100644
--- a/doc/sphinx/users-guide/vcl-grace.rst
+++ b/doc/sphinx/users-guide/vcl-grace.rst
@@ -1,11 +1,15 @@
 .. _users-guide-handling_misbehaving_servers:
 
-Misbehaving servers
+Grace mode and keep
 -------------------
 
-A key feature of Varnish is its ability to shield you from misbehaving
-web- and application servers.
+Sometimes you want Varnish to serve content that is somewhat stale
+instead of waiting for a fresh object from the backend. For example,
+if you run a news site, serving a main page that is a few seconds old
+is not a problem if this gives your site faster load times.
 
+In Varnish this is achieved by using `grace mode`. A related idea
+is `keep`, which is also explained here.
 
 Grace mode
 ~~~~~~~~~~
@@ -19,52 +23,199 @@ If you are serving thousands of hits per second the queue of waiting
 requests can get huge. There are two potential problems - one is a
 thundering herd problem - suddenly releasing a thousand threads to
 serve content might send the load sky high. Secondly - nobody likes to
-wait. To deal with this we can instruct Varnish to keep
-the objects in cache beyond their TTL and to serve the waiting
-requests somewhat stale content.
+wait.
+
+Setting an object's `grace` to a positive value tells Varnish that it
+should serve the object to clients for some time after the TTL has
+expired, while Varnish fetches a new version of the object. The default
+value is controlled by the runtime parameter ``default_grace``.
+
+Keep
+~~~~
+
+Setting an object's `keep` tells Varnish that it should keep an object
+in the cache for some additional time. There are two reasons to do this:
 
-So, in order to serve stale content we must first have some content to
-serve. So to make Varnish keep all objects for 2 minutes beyond their
-TTL use the following VCL::
+* To use the object to construct a conditional GET backend request (with
+  If-Modified-Since: and/or Ìf-None-Match: headers), allowing the backend
+  to reply with a 304 Not Modified response, which may be more efficient
+  on the backend and saves re-transmitting the unchanged body.
+* To be able to serve the object when grace has expired but we have a
+  problem with getting a fresh object from the backend. This will require
+  a change in ``sub vcl_hit``, as described below.
+
+The values are additive, so if grace is 10 seconds and keep is 1 minute,
+then objects will survive in cache for 70 seconds after the TTL has
+expired.
+
+Setting grace and keep
+~~~~~~~~~~~~~~~~~~~~~~
+
+We can use VCL to make Varnish keep all objects for 10 minutes beyond
+their TTL with a grace period of 2 minutes::
 
   sub vcl_backend_response {
-    set beresp.grace = 2m;
+       set beresp.grace = 2m;
+       set beresp.keep = 8m;
   }
 
-Now Varnish will be allowed to serve objects that are up to two
-minutes out of date. When it does it will also schedule a refresh of
-the object. This will happen asynchronously and the moment the new
-object is in it will replace the one we've already got.
+The effect of grace and keep
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For most users setting the default grace and/or a suitable grace for
+each object is enough. The default VCL will do the right thing and
+behave as described above. However, if you want to customize how varnish
+behaves by changing ``sub vcl_hit``, then you should know some of the
+details on how this works.
+
+When ``sub vcl_recv`` ends with ``return (lookup)`` (which is the
+default behavior), Varnish will look for a matching object in its
+cache. Then, if it only found an object whose TTL has run out, Varnish
+will consider the following:
+
+* Is there already an ongoing backend request for the object?
+* Is the object within the `grace period`?
 
-You can influence how this logic works by adding code in vcl_hit. The
-default looks like this::
+Then, Varnish reacts using the following rules:
+
+* If there is no backend request for the object, one is scheduled and
+  ``sub vcl_hit`` is called immediately.
+* If there is a backend request going on, but the object is under grace,
+  ``sub vcl_hit`` is called immediately.
+* If there is a backend request going on, but the grace has expired,
+  processing is halted until the backend request has finished and a
+  fresh object is available.
+
+Note that the backend fetch happens asynchronously, and the moment the
+new object is in it will replace the one we've already got.
+
+If you do not define your own ``sub vcl_hit``, then the default one is
+used. It looks like this::
 
   sub vcl_hit {
-     if (obj.ttl >= 0s) {
-         // A pure unadulterated hit, deliver it
-         return (deliver);
-     }
-     if (obj.ttl + obj.grace > 0s) {
-         // Object is in grace, deliver it
-         // Automatically triggers a background fetch
-         return (deliver);
-     }
-     // fetch & deliver once we get the result
-     return (fetch);
+       if (obj.ttl >= 0s) {
+            // A pure unadulterated hit, deliver it
+            return (deliver);
+       }
+       if (obj.ttl + obj.grace > 0s) {
+            // Object is in grace, deliver it
+            // Automatically triggers a background fetch
+            return (deliver);
+       }
+       // fetch & deliver once we get the result
+       return (miss);
   }
 
-The grace logic is pretty obvious here. If you have enabled
-:ref:`users-guide-advanced_backend_servers-health` you can check if
-the backend is sick and only serve graced object then. Replace the
-second if-clause with something like this::
+If you follow the code, you see that Varnish delivers graced objects
+while fetching fresh copies, but if grace has expired the clients have to
+wait until a new copy is available.
+
+Misbehaving servers
+~~~~~~~~~~~~~~~~~~~
+
+A key feature of Varnish is its ability to shield you from misbehaving
+web- and application servers.
+
+If you have enabled :ref:`users-guide-advanced_backend_servers-health`
+you can check if the backend is sick and modify the behavior when it
+comes to grace. There are essentially two ways of doing this. You can
+explicitly deliver kept object (that is not within grace) when you see
+that the backend is sick, or you can explicitly `not` serve an expired
+object when you know that the backend is healthy. The two methods have
+slightly different characteristics, as we shall see.
 
-   if (!std.healthy(req.backend_hint) && (obj.ttl + obj.grace > 0s)) {
-         return (deliver);
-   } else {
-         return (fetch);
+In both cases we assume that you avoid inserting objects into the cache
+when you get certain errors from the backend, for example by using the
+following::
+
+  sub vcl_backend_response {
+       if (beresp.status == 503 && bereq.is_bgfetch) {
+            return (abandon);
+       }
+  }
+
+Method 1: When the backend is healthy, use a lower grace value
+==============================================================
+
+Imagine that you have set an object's grace to a high value that you
+wish to use when the backend is sick, for example::
+
+  sub vcl_backend_response {
+       set beresp.grace = 24h;
+       // no keep
+  }
+
+Then you can use the following code as your ``sub vcl_hit``::
+
+   if (std.healthy(req.backend_hint)) {
+        // change the behavior for health backends: Cap grace to 10s
+	if (obj.ttl + obj.grace > 0s && obj.ttl + 10s > 0s) {
+             return (deliver);
+        } else {
+             return (miss);
+	}
    }
 
-So, to sum up, grace mode solves two problems:
- * it serves stale content to avoid request pile-up.
- * it serves stale content if you allow it.
+The effect of this is that, when the backend is healthy, objects with
+grace above 10 seconds will have an `effective` grace of 10 seconds.
+When the backend is sick, the default VCL kicks in, and the long grace
+is used.
+
+This method has one potentially serious problem when more than one
+client asks for an object that has expired its TTL. If the second of
+these requests arrives after the effective grace, but before the first
+request has completed, then the second request will be turned into a
+`pass`.
+
+In practice this method works well in most cases, but if you
+experience excessive `pass` behavior, this translates to a reduced
+hit rate and higher load on the backend. When this happens you will
+see the error message `vcl_hit{} returns miss without busy object` in
+the log.
+
+Method 2: When the backend is sick, deliver kept objects
+========================================================
+
+With this method, we assume that we have used `sub backend_response`
+to set `beresp.grace` to a value that is suitable for healthy backends,
+and with a `beresp.keep` that corresponds to the time we want to serve
+the object when the backend is sick. For example::
+
+  sub vcl_backend_response {
+       set beresp.grace = 10s;
+       set beresp.keep = 24h;
+  }
+
+The appropriate code for ``vcl_hit`` then becomes::
+
+   if (!std.healthy(req.backend_hint) && (obj.ttl + obj.grace + obj.keep > 0s)) {
+        return (deliver);
+   }
+
+Typically you can omit the second part of the if test due to the
+expiry thread deleting objects where `grace + keep` has expired. It is
+possible that the `expiry thread` can be lagging slightly behind, but
+for almost all practical purposes you are probably fine with the
+following::
+
+   if (!std.healthy(req.backend_hint)) {
+        return (deliver);
+   }
+
+The problem with this solution concerns requests that are waiting for
+a backend fetch to finish. If the backend fetch gets to ``return
+(abandon)``, then all the requests that are waiting will get to ``sub
+vcl_hit`` with an `error object` created by the error handling
+code/VCL. In other words, you risk that some clients will get errors
+instead of the more desirable stale objects.
+
+Summary
+~~~~~~~
+
+Grace mode allows Varnish to deliver slightly stale content to clients while
+getting a fresh version from the backend. The result is faster load times
+with a low cost.
 
+It is possible to change the behavior when it comes to grace and keep, for
+example by changing the `effective` grace depending on the health of the
+backend, but you have to be careful.