Dropped connections with tcp_tw_recycle=1

Sven Ulland sveniu at opera.com
Mon Sep 21 21:06:10 CEST 2009


Nils Goroll wrote:
>> tcp_tw_recycle is incompatible with NAT on the server side
> 
> ... because it will enforce the verification of TCP time stamps.
> Unless all clients behind a NAT (actually PAD/masquerading) device
> use identical timestamps (within a certain range), most of them will
> send invalid TCP timestamps so SYNs will get dropped.

I've been digging a bit more. The drops happen because PAWS thinks
they are "old duplicate segments from earlier incarnations of the
connection".

A new incoming connection request will eventually call
tcp_ipv4.c:tcp_v4_conn_request(), where we find the following code
that ends up dropping some SYNs if recycling is enabled:

if (tmp_opt.saw_tstamp &&
     tcp_death_row.sysctl_tw_recycle &&
     (dst = inet_csk_route_req(sk, req)) != NULL &&
     (peer = rt_get_peer((struct rtable *)dst)) != NULL &&
     peer->v4daddr == saddr) {
         if (get_seconds() < peer->tcp_ts_stamp + TCP_PAWS_MSL &&
             (s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW) {
                 NET_INC_STATS_BH(sock_net(sk), 
LINUX_MIB_PAWSPASSIVEREJECTED);
                 goto drop_and_release;
         }
}

The outer conditional verifies that the incoming SYN has a timestamp,
that tcp_tw_recycle is enabled, and that the origin exists in our
peer cache. Note that it only checks the IP of the origin. Doesn't it
make sense to also match on port?

The inner conditional tests two things: First, that the peer's last
seen timestamp has not expired (it expires in 60 ticks). Next, that
the new incoming timestamp [req->ts_recent] is at least one tick
[TCP_PAWS_WINDOW] *before* the last seen timestamp from the peer
[peer->tcp_ts] (i.e. that it's an old duplicate).

(Also, you can verify if you get drops by checking the PAWSPassive
value in /proc/net/netstat.)

Here's the origin of the code, appx B.2 (b) in VJ et al's RFC 1323:
"""
An additional mechanism could be added to the TCP, a per-host cache of
the last timestamp received from any connection [peer->tcp_ts]. This
value [peer->tcp_ts] could then be used in the PAWS mechanism to
reject old duplicate segments [req] from earlier incarnations of the
connection, if the timestamp clock can be guaranteed to have ticked at
least once [TCP_PAWS_WINDOW] since the old connection was open.
""" -- http://tools.ietf.org/html/rfc1323#page-29

I'm wondering why the source port is not taken into consideration
here. A "previous incarnation of the connection" would surely have the
same source port? So if a new incoming connection has a different
source port, it should not be a candidate for rejection.


tcp_tw_recycle and _reuse's actual reuse of tw buckets seems to happen
when setting up outbound connections. I haven't looked at those yet.

Sven


More information about the varnish-misc mailing list