backend timeouts/503s vs grace cache

Sun Nov 5 08:18:28 UTC 2017

I wanted to add that I don't believe our grace TTL is too low, as the site
has active monitoring requesting the front page every minute, so this
should keep cache primed. Problem is the 3rd party monitoring service
intermittently reports downtime errors due to the 503 replies sent by
Varnish while the ssh tunnel to the bakend is restarted

On Nov 5, 2017 10:12, "Andrei" <lagged at gmail.com> wrote:

Hello everyone,

One of the backends we have configured, runs through an SSH tunnel which
occasionally gets restarted. When the tunnel is restarted, Varnish is
returning a 503 since it can't reach the backend for pages which would
normally be cached (we force cache on the front page of the related site).
I believe our grace implementation might be incorrect, as we would expect a
grace period cache return instead of 503.

Our grace ttl is set to 21600 seconds based on a global variable:

sub vcl_backend_response {
  set beresp.grace = std.duration(variable.global_get("ttl_grace") + "s",
6h);
}

Our grace implementation in sub vcl_hit is:

  sub vcl_hit {
    # We have no fresh fish. Lets look at the stale ones.
    if (std.healthy(req.backend_hint)) {
      # Backend is healthy. Limit age to 10s.
      if (obj.ttl + 10s > 0s) {
        #set req.http.grace = "normal(limited)";
        std.log("OKHITDELIVER: obj.ttl:" + obj.ttl + " obj.keep: " +
obj.keep + " obj.grace: " + obj.grace);
        return (deliver);
      } else {
        # No candidate for grace. Fetch a fresh object.
        std.log("No candidate for grace. Fetch a fresh object. obj.ttl:" +
obj.ttl + " obj.keep: " + obj.keep + " obj.grace: " + obj.grace);
        return(miss);
      }
    } else {
      # backend is sick - use full grace
        if (obj.ttl + obj.grace > 0s) {
        #set req.http.grace = "full";
        std.log("SICK DELIVERY: obj.hits: " +   obj.hits + " obj.ttl:" +
obj.ttl + " obj.keep: " + obj.keep + " obj.grace: " + obj.grace);
        return (deliver);
      } else {
        # no graced object.
        std.log("No graced object. obj.ttl:" + obj.ttl + " obj.keep: " +
obj.keep + " obj.grace: " + obj.grace);
        return (miss);
      }
    }

    # fetch & deliver once we get the result
    return (miss); # Dead code, keep as a safeguard
  }

Occasionally we see:
-   VCL_Log        No candidate for grace. Fetch a fresh object.
obj.ttl:-1369.659 obj.keep: 0.000 obj.grace: 21600.000

For the most part, it's:
-   VCL_Log        OKHITDELIVER: obj.ttl:26.872 obj.keep: 0.000 obj.grace:
21600.000

Are we setting the grace ttl too low perhaps?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20171105/85fd7cf3/attachment.html>