scheduling off the waiting list
Nils Goroll
slink at schokola.de
Mon Dec 28 22:20:15 CET 2009
Hi Poul and all,
Nils Goroll wrote:
>>> What I would really like to see is that the waitinglist gets rescheduled when
>>> the busy object is actually becomes in the cache. I am suspecting this has to do
>>> with calling HSH_Deref(&Parent) in HSH_Unbusy and/or the fact that HSH_Drop
>>> calls both Unbusy and Deref, but I don't understand this yet.
>> That is how it is supposed to work, and I belive, how it works.
>
> Good. Then I am either messing up this behavior with my config, or I've hit a
> corner case. I need to have a break now, but I will definitely get back to you
> on this when I have gained new insights.
I'm trying to sort my thoughts on this in public:
- A fundamental issue seems to be that the waitinglist is attached to the object
head, and if no proper match is found in the cache, we wait for whatever is to
come, even if this is not what we are going to need.
On the other hand, while the object is busy, not all selection criteria will be
known a priori (in particular not the Vary header), so this design might just be
as good as it can be.
- The only way a session can get onto the waiting list is when there is a busy
object being waited for
- but hsh_rush is not only called when an object gets unbusied (HSH_Unbusy), but
also whenever is it dereferenced (HSH_Deref)
Call trees are:
cnt_fetch -> HSH_Unbusy->hsh_rush
^ |
/ |
HSH_Drop (parent)
\ |
V V
HSH_Deref->hsh_rush
HSH_Deref is called from cache_expire EXP_NukeOne and exp_timer, as well as
cache_center cnt_hit (if not delivering), cnt_lookup (if it's a pass) and
cnt_deliver.
HSH_Drop is called from various functions in cache_center.
So basically there are two different scenarios when hsh_rush is called.
* Trigger delivery of an object which just got unbusied
* and trigger delivery of more sessions which did not fire in the first round
The point is that when many sessions are waiting on a busy object, there are
many reasons for those to be rescheduled even if the object they are waiting for
has not yet become available - in particular as many different objects may live
under the object head.
I think we need to change that.
The only reason why we need to call hsh_rush outside cnt_fetch->HSH_Unbusy case
is that we have the rush_exponent and limit the number of sessions to be
rescheduled with each hsh_rush, so one option would be to do away with the
rush_exponent and the the waiters loose all at once. This would also solve the
case where, once a session get its thread, the cached content has become
invalidated so it would itself fetch again.
I am not sure about an alternative solution.
When we unbusy an object, we have a good chance that it's actually worth
rescheduling waiting sessions, but for the other cases, we can't easily tell if
the session would wait again or not.
What if we noted in the object head the number of busy objects so hsh_rush would
only actually schedule sessions if there aren't any or when called from cnt_fetch?
Any better ideas?
Thank you for reading,
Nils
More information about the varnish-dev
mailing list