Understand "hit for pass" cache objects

Mon Feb 15 22:56:01 CET 2010

Hello,

I have just started using Varnish 2.0.6 in the past week as a 
replacement for Squid. So far, I love the fine grained control you have 
over what goes into cache (as opposed to Squid's "I'll cache it when I 
feel it's supposed to be cached, but not tell you why" approach). That 
said, I'm trying to better understand the "hit for pass" cache objects 
that Varnish will sometimes create. Here is basic flow of my vcl (much 
of it is based on the concepts on the intro page: 
http://varnish-cache.org/wiki/Introduction)

vcl_recv:
Default action is "lookup". Action changes to "pass" if ...
* Cache-Control or Pragma headers has "no-cache"
* HTTP auth is in use (Authorization header)
* Request contains cookie "bypass_cache=true"
* Request type is not GET, HEAD, POST, PUT, TRACE, OPTIONS, DELETE

vcl_fetch:
Default action is "deliver". Action changes to "pass" if ...
* Response is deemed uncacheable (!obj.cacheable)
* Response contains Cache-Control headers that say "no-cache"
* HTTP auth is in use (Authorization header)
* Request contains cookie "bypass_cache=true"
* Response contains Set-Cookie header

Now on to the problem at hand. My understanding (please correct any 
errors) of the "hit for pass" object is that any time the action is 
"pass" within vcl_fetch, Varnish will create a "hit for pass" object to 
make future requests for the same URL hash go straight to the back end 
instead of lining them up serially and waiting for a response from the 
first request. Until that object's TTL expires, the "hit for pass" 
object will remain in cache and never be replaced with a fresh object 
from the backend.

Here is what is happening my my example.

Client A visits the URL http://www.example.com/. Since this is the first 
time they visit the site, the backend code tries to start a session (PHP 
code), which sends a Set-Cookie header in the response. In vcl_fetch, 
Varnish sees the Set-Cookie header and issues the "pass" action. Now 
there is a "hit for pass" cache object with a TTL based upon the 
Cache-Control/Expires headers or the default TTL (let's assume 120 seconds).

Client B visit the same URL http://www.example.com/. Varnish finds a 
"hit for pass" object in the cache, so it sends the request directly to 
the backend. This same thing will continue for any future clients until 
120 seconds have elapsed.

Herein lies my dilemma. A request for the same URL 
(http://www.example.com/) is sometimes cacheable and sometimes not 
cacheable (it usually depends on whether it's the first time a user 
visits the site and the Set-Cookie header has to be sent). What this 
means is if I have a very heavy hit URL as a landing page from Google, 
most of the time there will be a "hit for pass" cache object in Varnish, 
since most people going to that page will have a Set-Cookie header. The 
only time it will cache the page is if I'm lucky and someone visits the 
page while there is no "hit for pass" cache object and their request 
doesn't result in a "pass" action from vcl_fetch.

In my situation, I think I could avoid this problem altogether if I 
could make Varnish store a DIFFERENT set of headers in the cache object 
than the headers return to the client. For example, if I receive a 
response with a Set-Cookie header, I would remove the Set-Cookie header 
from the soon-to-be-cached object (so it wouldn't serve that header up 
for everyone), but LEAVE the Set-Cookie header for the individual that 
made the original request. This would allow the page to cache normally 
even if the only requests going to that page result in a Set-Cookie 
header. However, from what I've been able to see, there is no way to do 
this.

Does anyone have any recommendations to get around this? In a perfect 
world, my caching server would work this way:

* vcl_recv: If any criteria from A through D are met, don't pull this 
request from cache and go to the backend
* vcl_fetch: If any criteria from E through G are met, send the object 
straight to the client without touching the cache.

The "without touching the cache" portion seems to be where I am falling 
down.

-- 
Justin Pasher