Discussion:
[stunnel-users] Effect of SIGHUP on connections
(too old to reply)
Philippe Anctil
2016-02-11 20:10:03 UTC
Permalink
Hello,

I have been testing if sending sighup to stunnel has adverse effects on
existing connections (in fork mode, v5.19). I have found new connections
may fail to complete.

In real world scenarios it is not a big issue. First you will reload only
once. Second, you will probably do it during slow hours. Chances to impact
someone are very low.

In any case, I wanted to understand the circumstances leading to failed
connections.

My test is simple. I try to establish 10 connections at 1 second interval.
At the same time, I generate a lot of sighup signals. At least one
connection fails every time.

Even with debug turned on, I can't find any indication of failed
connections in stunnel log.

But the real mystery is this. If I run a tcp capture, I will see
established connections.

29 2016-02-10 12:07:14.304987 0.000000 142.168.148.114
142.168.66.111 TCP 66 56572→4443 [SYN] Seq=0 Win=8192 Len=0
MSS=1260 WS=256 SACK_PERM=1
30 2016-02-10 12:07:14.305001 0.000014 142.168.66.111
142.168.148.114 TCP 66 4443→56572 [SYN, ACK] Seq=0 Ack=1
Win=5840 Len=0 MSS=1460 SACK_PERM=1 WS=128
31 2016-02-10 12:07:14.312908 0.007907 142.168.148.114
142.168.66.111 TCP 60 56572→4443 [ACK] Seq=1 Ack=1
Win=66560 Len=0
32 2016-02-10 12:07:14.312918 0.000010 142.168.66.111
142.168.148.114 TCP 54 4443→56572 [RST] Seq=1 Win=0 Len=0
33 2016-02-10 12:07:14.313097 0.000179 142.168.148.114
142.168.66.111 SSL 371 Client Hello
34 2016-02-10 12:07:14.313102 0.000005 142.168.66.111
142.168.148.114 TCP 54 4443→56572 [RST] Seq=1 Win=0 Len=0

How could a tcp connection be established and yet find no indication of
that in stunnel.log?

Thanks!
--
Philippe Anctil
Michal Trojnara
2016-02-12 16:52:35 UTC
Permalink
Post by Philippe Anctil
My test is simple. I try to establish 10 connections at 1 second
interval. At the same time, I generate a lot of sighup signals.
I presume you were only sending the signals to the main process. Right?
Post by Philippe Anctil
At least one connection fails every time. 30 2016-02-10
12:07:14.305001 0.000014 142.168.66.111 142.168.148.114
TCP 66 4443→56572 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0
MSS=1460 SACK_PERM=1 WS=128 31 2016-02-10 12:07:14.312908 0.007907
142.168.148.114 142.168.66.111 TCP 60 56572→4443
[ACK] Seq=1 Ack=1 Win=66560 Len=0 32 2016-02-10 12:07:14.312918
0.000010 142.168.66.111 142.168.148.114 TCP 54
4443→56572 [RST] Seq=1 Win=0 Len=0
So it sends RST 0.01ms after it received the final ACK of the TCP
handshake. My theory is that it is caused by the listening socket
being momentarily closed in the middle of the TCP handshake, i.e.,
before the kernel informs the userspace (stunnel in this case) about
the newly established TCP connection.

The solution would require identifying the listening sockets defined
both in the old and the new configuration file (i.e. before and after
the reload), and caching the file descriptors instead of reopening
them. This would introduce a relatively complex (and thus
error-prone) piece of code just to enable accepting new connections
during the configuration file reload.

I might only consider implementing this for a paying customer.
Post by Philippe Anctil
How could a tcp connection be established and yet find no
indication of that in stunnel.log?
According to my theory this is because the userspace never gets
notified about this half-established connection.

Best regards,
Mike
Philippe Anctil
2016-02-12 22:39:37 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Post by Philippe Anctil
My test is simple. I try to establish 10 connections at 1 second
interval. At the same time, I generate a lot of sighup signals.
I presume you were only sending the signals to the main process. Right?
Correct.
Post by Philippe Anctil
At least one connection fails every time. 30 2016-02-10
12:07:14.305001 0.000014 142.168.66.111 142.168.148.114
TCP 66 4443→56572 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0
MSS=1460 SACK_PERM=1 WS=128 31 2016-02-10 12:07:14.312908 0.007907
142.168.148.114 142.168.66.111 TCP 60 56572→4443
[ACK] Seq=1 Ack=1 Win=66560 Len=0 32 2016-02-10 12:07:14.312918
0.000010 142.168.66.111 142.168.148.114 TCP 54
4443→56572 [RST] Seq=1 Win=0 Len=0
So it sends RST 0.01ms after it received the final ACK of the TCP
handshake. My theory is that it is caused by the listening socket
being momentarily closed in the middle of the TCP handshake, i.e.,
before the kernel informs the userspace (stunnel in this case) about
the newly established TCP connection.
I think you are right.

I thought the tcp connection was established during the call to accept().
Because of this I concentrated my research around that call and I simply
could not understand why none of the traces put around it were triggered.

Well, that's because the situation is happening when the program loops
somewhere else, most likely in daemon_loop. For instance, after receiving a
signal or after accepting one connection. Because the socket is still
listening, the OS lets a new tcp connection come in. If a sighup signal is
about to be pulled from signal_pipe, it is entirely possible the socket
gets closed right after the new tcp connection is established and before it
can be accept()ed by the program. The socket is closed without checking if
there are pending connections. And so there can't be any indication of that
in the log.

Note I am not making any claim that situation could or should be detected.
I was mostly interested in understanding .

Thanks for your input!
--
Philippe Anctil
Michał Trojnara
2016-02-13 11:17:18 UTC
Permalink
Post by Philippe Anctil
Well, that's because the situation is happening when the program
loops somewhere else, most likely in daemon_loop.
This is not how BSD sockets work. The interface between applications
and the kernel resembles function calls, but kernel is nothing like a
library. When you call listen(2), the TCP state machine in the kernel
starts accepting new connections. The kernel keeps an internal
backlog queue of the *successfully established connections*. The size
of this queue is specified as the second parameter of listen(2). A
file descriptor of a newly established connection can then be read
from the backlog queue with accept(2). The simplest way to wait for a
new connection is to execute a blocking accept(2). This is however
*not* the way stunnel does it. Instead, stunnel awaits incoming
connections on multiple accepting sockets using poll(2), and only then
it retrieves the file descriptors of the newly established connections.

Normally, accepting a new connection works as follows:
(initial connection state: closed)
- The kernel receives a SYN packet.
- The kernel sends the SYN+ACK reply.
(new connection state: half-open)
- The kernel receives the final ACK packet.
(new connection state: open)
- The kernel signals that a new connection is available on the
listening socket.
- stunnel function poll(2) returns indicating a new open connection.
- stunnel retrieves the file descriptor with accept(2).

In your case:
(initial connection state: closed)
- The kernel receives a SYN packet.
- The kernel sends the SYN+ACK reply.
(new connection state: half-open)
- stunnel closes the listening socket, invalidating any half-open
connections.
(new connection state: closed)
- The kernel receives the final ACK packet.
- The kernel sends back RST, because the ACK packet does not match
any connection known to the kernel's TCP state machine.

I guess the part that confused you is that the server's TCP handshake
is only complete after *receiving* the final ACK, while the client's
TCP handshake is complete after *sending* the final ACK.
Consequently, while your client reports the connection as opened and
quickly reset, the server's connection is never opened (the server's
handshake is never completed).

Best regards,
Mike

Loading...