简体   繁体   中英

Large TCP kernel buffering cause application to fail on FIN

I'd like to reopen a previous issue that was incorrectly classified as a network engineering problem and after more test, I think it's a real issue for programmers.

So, my application streams mp3 files from a server. I can't modify the server. The client reads data from the server as needed which is 160kbits/s and feeds it to a DAC. Let's use a file the file of 3.5MB.

When the server is done sending last byte, it closes the connection, so it sends a FIN, seems normal practice.

The problem is that the kernel, especially on Windows, seems to store 1 to 3 MB of data, I assume TCP window size has fully opened.

After a few seconds, the server has sent the whole 3.5 MB and about 3MB sit inside the kernel buffer. At this point the server has sent FIN which is ACK in due time.

From a client point of view, it continues reading data by chunk of 20kB and will do that for the next 3MB/20 ~= 150s before it sees the EOF.

Meanwhile the server is in FIN_WAIT_2 (and not TIME_WAIT as I initially wrote, thank to Steffen for correcting me. Now, OS like Windows seems to have a half-closed socket timer that starts with sending their FIN and be as small as 120s, regardless of the actual TCPWindowsize BTW). Of course after 120s it considers that it should have received a client's FIN, so it sends a RST. That RST cause all client's kernel buffer to be discarded and the application fails.

As code is required, here is:

int sock = socket(AF_INET, SOCK_STREAM, 0);

struct sockaddr_in addr;

addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
addr.sin_family = AF_INET;
addr.sin_port = htons(80);

int res = connect(sock, (const struct sockaddr*) & addr, sizeof(addr));

char* get = "GET /data-3 HTTP/1.0\n\r"
        "User-Agent: mine\n\r"
        "Host: localhost\n\r"
        "Connection: close\n\r"
        "\n\r\n\r";

bytes = send(sock, get, strlen(get), 0);
printf("send %d\n", bytes);

char *buf = malloc(20000);

while (1) {
    int n = recv(sock, buf, 20000, 0);
    if (n == 0) {
        printf(“normal eof at %d”, bytes);
        close(sock);
        break;
    }
    if (n < 0) {
        printf(“error at %d”, bytes);
        exit(1);
    }
    bytes += n;
    Sleep(n*1000/(160000/8));
}
free(buf);
closesocket(sock);

It can be tested with any HTTP server.

I know there are solutions by having a handshake with the server before it closes the socket (but server is just an HTTP server) but the kernel level of buffering make that a systematic failure when its buffer are larger than the time to consume them.

The client is perfectly real time in absorbing data. Having a larger client buffer or no buffer at all does not change the issue which seems a system design flaw to me, unless there is possibility to either control kernel buffers, at the application level, not the whole OS, or detect a FIN reception at client level before the EOF of recv(). I've tried to change SO_RCVBUF but it does not seems to influence logically this level of kernel buffering.

Here is a capture of one successful and one failed exchange

success
3684    381.383533  192.168.6.15    192.168.6.194   TCP 54  [TCP Retransmission] 9000 → 52422 [FIN, ACK] Seq=9305427 Ack=54 Win=262656 Len=0
3685    381.387417  192.168.6.194   192.168.6.15    TCP 60  52422 → 9000 [ACK] Seq=54 Ack=9305428 Win=131328 Len=0
3686    381.387417  192.168.6.194   192.168.6.15    TCP 60  52422 → 9000 [FIN, ACK] Seq=54 Ack=9305428 Win=131328 Len=0
3687    381.387526  192.168.6.15    192.168.6.194   TCP 54  9000 → 52422 [ACK] Seq=9305428 Ack=55 Win=262656 Len=0

failed
5375    508.721495  192.168.6.15    192.168.6.194   TCP 54  [TCP Retransmission] 9000 → 52436 [FIN, ACK] Seq=5584802 Ack=54 Win=262656 Len=0
5376    508.724054  192.168.6.194   192.168.6.15    TCP 60  52436 → 9000 [ACK] Seq=54 Ack=5584803 Win=961024 Len=0
6039    628.728483  192.168.6.15    192.168.6.194   TCP 54  9000 → 52436 [RST, ACK] Seq=5584803 Ack=54 Win=0 Len=0

Here is what I think is the cause, thanks very much to Steffen for putting me on the right track.

  • an mp3 file is 3.5 MB at 160 kbits/s = 20 kB/s
  • a client reads it at the exact required speed, 20kB/sec, let's say one recv() of 20kB per second, no pre-buffering for simplicity
  • some OS, like Windows, can have very large TCP kernel buffer (about 3MB or more) and with a fast connection, the TCP windows size is widely open
  • in a matter of seconds, the whole file is sent to the client, let's say that about 3MB are in the kernel buffers
  • as far as the server is concerned, all has been sent and acknowledge, so it does a close()
  • the close() sends a FIN to the client which responds by an ACK and the server enters FIN_WAIT_2 state
  • BUT, at that point from a client point of view, all recv() will have plenty of read for the next 150 s before it sees the eof!
  • so client, will not do a close() and thus will not send a FIN
  • the server is in FIN_WAIT_2 state and according to the TCP specs, it should stay like that forever
  • now, various OS (Windows at least) start a timer similar to TIME_WAIT (120s) when starting a close(), or when receiving the ACK of their FIN, that I don't know (in fact Windows has a specific registry entry for that, AFAIK). This is to more aggressively deal with half-closed sockets.
  • of course, after 120s, the server has not seen a client's FIN and sends a RST
  • that RST is received by the client and causes an error there and all the remaining data in the TCP buffers to be discarded and lost
  • of course, not of that happens with high bitrate formats as the client consumes data fast enough so that the kernel TCP buffers are never idle for 120s and it might not happen for low bit rate when the application buffering system reads it all. It has to be the bad combination of bitrate, file size and kernel's buffers... hence it does not happen all the time.

That's it. That can be reproduced with a few lines of code and every HTTP server. This can be debated, but I see that as a systemic OS issue. Now, the solution that seems to work is to force client's receive buffers (SO_RCVBUF) to a lower level so that the server has little chances to have sent all data and that data sits in client's kernel buffers for too long. Note that this can still happen though if the buffer is 20kB and the client consumes it at 1B/s... hence I call it a systemic failure instead. Now I agree that some will see that as an application issue

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM