[master] f3575e382 Avoid deadlock in VRT_AddDirector() when VCL is going cold

Nils Goroll nils.goroll at uplex.de
Mon Feb 5 23:44:03 UTC 2024


commit f3575e382cf00f5cb5a69cbc1f5eb1f42246c87a
Author: Nils Goroll <nils.goroll at uplex.de>
Date:   Wed Jan 31 09:25:04 2024 +0100

    Avoid deadlock in VRT_AddDirector() when VCL is going cold
    
    If VRT_AddDirector() was called from handling a VCL_COLD event or,
    indirectly, from another thread which the VCL_COLD event handler was
    waiting for, varnishd would deadlock and prevent any CLI or director
    changes, because VRT_AddDirector() requires the vcl_mtx, which is held
    during vcl_BackendEvent() to ensure a consistent view of the director
    list. Because of the early return from VRT_AddDirector() this likely
    only happened in VTC mode, but the underlying race existed
    nevertheless.
    
    This patch _almost_ fixes the issue with the intend of making it
    highly unlikely to occur without getting too involved with the vcl
    temperature controls: We now check the same conditions under which
    vcl_set_state() would transition the temperature to COOLING and, if
    they apply, use Lck_Trylock() in a try/wait loop instead of
    Lck_Lock(), avoiding the deadlock.
    
    The patch presumably still does not fix the problem entirely, because
    the reads of vcl->busy and vcl->temp before the Lck_Trylock() could
    still be outdated. With the temperature controls otherwise unchanged,
    the only alternative idea I could come up with was to always use a
    try/wait loop, which I dismissed due to the performance impact
    (overhead and added latency).
    
    Ref https://github.com/nigoroll/libvmod-dynamic/issues/110

diff --git a/bin/varnishd/cache/cache_vrt_vcl.c b/bin/varnishd/cache/cache_vrt_vcl.c
index a454b14d9..82e6e10a5 100644
--- a/bin/varnishd/cache/cache_vrt_vcl.c
+++ b/bin/varnishd/cache/cache_vrt_vcl.c
@@ -219,8 +219,25 @@ VRT_AddDirector(VRT_CTX, const struct vdi_methods *m, void *priv,
 	 * not be legal to do so. Because we change the VCL temperature before
 	 * sending COLD events we have to tolerate and undo attempts for the
 	 * COOLING case.
+	 *
+	 * To avoid deadlocks during vcl_BackendEvent, we only wait for vcl_mtx
+	 * if the vcl is busy (ref vcl_set_state())
 	 */
-	Lck_Lock(&vcl_mtx);
+
+	while (1) {
+		temp = vcl->temp;
+		if (temp == VCL_TEMP_COOLING)
+			return (vcldir_surplus(vdir));
+		if (vcl->busy == 0 && vcl->temp->is_warm) {
+			if (! Lck_Trylock(&vcl_mtx))
+				break;
+			usleep(10 * 1000);
+			continue;
+		}
+		Lck_Lock(&vcl_mtx);
+		break;
+	}
+	Lck_AssertHeld(&vcl_mtx);
 	temp = vcl->temp;
 	if (temp != VCL_TEMP_COOLING)
 		VTAILQ_INSERT_TAIL(&vcl->director_list, vdir, list);


More information about the varnish-commit mailing list