Backend Fetch failed

Rodney Bizzell rbizzell at measinc.com
Thu Apr 6 20:46:58 CEST 2017


I will definitely make those changes appreciate your help. I make that change under the .probe and the site started working  "/";

-----Original Message-----
From: Geoff Simmons [mailto:geoff at uplex.de]
Sent: Thursday, April 06, 2017 1:38 PM
To: Rodney Bizzell <rbizzell at measinc.com>
Cc: varnish-misc at varnish-cache.org
Subject: Re: Backend Fetch failed

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

For problems like this, *always look for the FetchError entry in the backend logs*.

> << BeReq    >> 65547
[...]
> -   FetchError     no backend connection -   Timestamp      Beresp:
> 1491485655.912871 0.000051 0.000051 -   Timestamp      Error:
> 1491485655.912878 0.000059 0.000007
[...]
> -   End

The client-side logs, on the other hand, frankly don't matter -- not for the purposes of diagnosing the problem with the backend fetch. So I'll just ignore them altogether.

> *   << Request  >> 65546
[...]
> -   End

> *   << Request  >> 5
[...]
> -   End

> *   << BeReq    >> 6
[...]
> -   FetchError     no backend connection -   Timestamp      Beresp:
> 1491485659.606340 0.000056 0.000056 -   Timestamp      Error:
> 1491485659.606347 0.000062 0.000006
[...]

FetchError "no backend connection" very likely means, in this case, that your backend is failing its health checks, so that Varnish determines that there is no healthy backend to which it can direct the requests.

There is one other possibility for "no backend connection", which is that Varnish attempted to initiate a network connection to the backend, but the connection could not be obtained before connect_timeout expired. In that case, the timestamps would have shown that almost exactly as much time as connect_timeout would have been taken, which for your config would be very obvious (more about that further down). But as you see here in the Timestamp entries, Varnish determined the error after about 50 microseconds, which is near-certain proof that the health checks failed (about enough time for Varnish to check its record that the backend is unhealthy).

You can see the results of the health checks in the log, but for that you need raw grouping, since health checks are not transactional (they are not part of requests/responses that Varnish serves):

$ varnishlog -g raw -i Backend_health

Your health checks are probably failing because you've written the probes incorrectly:

backend drupal {
[...]
    .probe = {
        .url = "drupal.miat.com<http://drupal.miat.com>";
[...]
     }
}

This is very common misunderstanding: "url" in the conceptual world of Varnish only ever refers to the *path*; the domain should not appear there. So your probes should say something like:

backend drupal {
[...]
    .probe = {
        .url = "/"; # or whatever path should be used for probes [...]
     }
}

Even after you fix that, you're really taking chances with the short timeout for the probes:

    .probe = {
[...]
        .timeout = 60ms;
[...]
     }

Are you sure that your backends will always respond to the health probes within 60 milliseconds? Set it to 1s and give them a chance.

That, I think, is the cause of your 503 problem, but I have to say something about this as well, the timeouts you have set for all of your backends:

    .connect_timeout = 6000s;
    .first_byte_timeout = 6000s;
    .between_bytes_timeout = 6000s;

Those timeouts are astonishingly, appallingly, gobsmackingly too long.
Just looking at that is almost making my head explode.

This is another common mistake: setting the Varnish timeouts to "forever and ever and ever". On the contrary, you're much better off if the timeouts are set to *fail fast*.

Setting your timeouts to 100 minutes helps absolutely no one at all -- it means that a worker thread in Varnish will sit idle for 100 minutes, waiting for something to happen. Worker threads are a limited resource in Varnish; you want them to keep doing useful work, and give up relatively soon if a backend is not responding. If there is a serious problem in your system, so that many backends are not responding, then your worker threads will all go idle waiting for the timeouts to expire, and Varnish will have to start new threads.
Eventually the maximum number of threads will be reached, and when that happens, Varnish will start to refuse new requests, which usually means that your site goes down altogether. It's a recipe for disaster.

Rest assured that if your backend has not responded for 5999 seconds, then it's not going to respond in the 6000th second either. It's not responding at all.

Consider just going with the default timeouts, or with something on the order of 6 seconds, rather than 6000 seconds. Or maybe 60 seconds, but that's already getting too long. If your backend developers can't get their apps to respond within a few seconds, then go yell at them until they do. As the Varnish admin, you *cannot* solve that problem for them, by setting your timeouts to "until the end of time".


HTH,
Geoff
- --
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJY5n0OAAoJEOUwvh9pJNUReBEP/1tFl8TigTIpQgng09dNc9jT
XYLLnFxZXFsjDsENX8kkemyk94AfW95AOpbFNoqtALkGiHLXTDWy0h++Lw3hT1ll
GxS8m5/qQ1+IpXXHpjHC86et1PTq7aKWtNTud0riA4b9jirlNYcdk/zaZCB/zRyA
5FzHh7By3LzJZ6qHXycYZWBy3PUQZfG1awX3VWtOzj+UP/hfHIlb6CcY97uF/8L9
Z7uff42o14iYFCGyALsy0JP3la/3qtb1tuzTn1vgqvBM9pVTdRKQXmL9Q/8XsX+Z
ySdHMaGG8/5WnUFznwXayEN84Y5fdYk6ZzGbAV3sZtQJkpHXquhj/LRQYDIjxESp
ILDh/FobMqevvXFBL/IcjaEj22xYyviu/8fYK+/QPfQ2yv5B0FWX1yIQDyNZx+4e
37XVDd96EMxA/t1XfTVk2DGw9kEtFPmLdatQx487vJsd4OyT3HX6Tiug5T2pHyPT
H/a2qKoRMOySD9i0SYMJG0v81Fi/jrJknZJZ/WHAIo4GAs2CRvFH+oI2/USMQzPj
brT/JeyVGOUObXkVA1uEYtrucUU07qOtdeVP5RBs6zaULJyu/KbIIF0cQMd0YBam
yXBwNVl89ec1RIcHl7TuTzNQ0euqgFyNZW1OAlQIbJKDProf6BHsyGwAXX7jexio
PkdtqxaiGBWa5OBR4Gws
=p7Ha
-----END PGP SIGNATURE-----


This email (including any attachments) may contain confidential information intended solely for acknowledged recipients. If you think you have received this information in error, please reply to the sender and delete all copies from your system. Please note that unauthorized use, disclosure, or further distribution of this information is prohibited by the sender. Note also that we may monitor email directed to or originating from our network. Thank you for your consideration and assistance. |



More information about the varnish-misc mailing list