Why does setting SO_SNDBUF and SO_RCVBUF destroy performance?

Question

Running in Docker on a MacOS, I have a simple server and client setup to measure how fast I can allocate data on the client and send it to the server. The tests are done using loopback (in the same docker container). The message size for my tests was 1000000 bytes.

When I set SO_RCVBUF and SO_SNDBUF to their respective defaults, the performance halves.

SO_RCVBUF defaults to 65536 and SO_SNDBUF defaults to 1313280 (retrieved by calling getsockopt and dividing by 2).

Tests:

When I test setting neither buffer size, I get about 7 Gb/s throughput.
When I set one buffer or the other to the default (or higher) I get 3.5 Gb/s.
When I set both buffer sizes to the default I get 2.5 Gb/s.

Server code: (cs is an accepted stream socket)

void tcp_rr(int cs, uint64_t& processed) {
    /* I remove this entire thing and performance improves */
    if (setsockopt(cs, SOL_SOCKET, SO_RCVBUF, &ENV.recv_buf, sizeof(ENV.recv_buf)) == -1) {
        perror("RCVBUF failure");
        return;
    }
    char *buf = (char *)malloc(ENV.msg_size);
    while (true) {
        int recved = 0;
        while (recved < ENV.msg_size) {
            int recvret = recv(cs, buf + recved, ENV.msg_size - recved, 0);
            if (recvret <= 0) {
                if (recvret < 0) {
                    perror("Recv error");
                }
                return;
            }
            processed += recvret;
            recved += recvret;
        }
    }
    free(buf);
}

Client code: (s is a connected stream socket)

void tcp_rr(int s, uint64_t& processed, BenchStats& stats) {
    /* I remove this entire thing and performance improves */
    if (setsockopt(s, SOL_SOCKET, SO_SNDBUF, &ENV.send_buf, sizeof(ENV.send_buf)) == -1) {
        perror("SNDBUF failure");
        return;
    }
    char *buf = (char *)malloc(ENV.msg_size);
    while (stats.elapsed_millis() < TEST_TIME_MILLIS) {
        int sent = 0;
        while (sent < ENV.msg_size) {
            int sendret = send(s, buf + sent, ENV.msg_size - sent, 0);
            if (sendret <= 0) {
                if (sendret < 0) {
                    perror("Send error");
                }
                return;
            }
            processed += sendret;
            sent += sendret;
        }
    }
    free(buf);
}

Zeroing in on SO_SNDBUF:
The default appears to be: net.ipv4.tcp_wmem = 4096 16384 4194304

If I setsockopt to 4194304 and getsockopt (to see what's currently set) it returns 425984 (10x less than I requested).

Additionally, it appears a setsockopt sets a lock on buffer expansion (for send, the lock's name is SOCK_SNDBUF_LOCK which prohibits adaptive expansion of the buffer). The question then is - why can't I request the full size buffer?

Answer 1

Clues for what is going on come from the kernel source handle for SO_SNDBUF (and SO_RCVBUF but we'll focus on SO_SNDBUF below).

net/core/sock.c contains implementations for the generic socket operations, including the SOL_SOCKET getsockopt and setsockopt handles.

Examining what happens when we call setsockopt(s, SOL_SOCKET, SO_SNDBUF, ...) :

        case SO_SNDBUF:
                /* Don't error on this BSD doesn't and if you think
                 * about it this is right. Otherwise apps have to
                 * play 'guess the biggest size' games. RCVBUF/SNDBUF
                 * are treated in BSD as hints
                 */
                val = min_t(u32, val, sysctl_wmem_max);
set_sndbuf:
                sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
                sk->sk_sndbuf = max_t(int, val * 2, SOCK_MIN_SNDBUF);
                /* Wake up sending tasks if we upped the value. */
                sk->sk_write_space(sk);
                break;

        case SO_SNDBUFFORCE:
                if (!capable(CAP_NET_ADMIN)) {
                        ret = -EPERM;
                        break;
                }
                goto set_sndbuf;

Some interesting things pop out.

First of all, we see that the max possible value is sysctl_wmem_max , a setting which is difficult to pin down within a docker container. We know from the context above that this is likely 212992 (half your max value you retrieved after trying to set 4194304).

Secondly, we see SOCK_SNDBUF_LOCK being set. This setting is in my opinion not well documented in the man pages, but it appears to lock dynamic adjustment of the buffer size.

For example, in the function tcp_should_expand_sndbuf we get:

static bool tcp_should_expand_sndbuf(const struct sock *sk)
{
        const struct tcp_sock *tp = tcp_sk(sk);

        /* If the user specified a specific send buffer setting, do
         * not modify it.
         */
        if (sk->sk_userlocks & SOCK_SNDBUF_LOCK)
                return false;
...

So what is happening in your code? You attempt to set the max value as you understand it, but it's being truncated to something 10x smaller by the sysctl sysctl_wmem_max . This is then made far worse by the fact that setting this option now locks the buffer to that smaller size. The strange part is that apparently dynamically resizing the buffer doesn't have this same restriction, but can go all the way to the max.

If you look at the first code snip above, you see the SO_SNDBUFFORCE option. This will disregard the sysctl_wmem_max and allow you to set essentially any buffer size provided you have the right permissions.

It turns out processes launched in generic docker containers don't have CAP_NET_ADMIN , so in order to use this socket option, you must run in --privileged mode. However, if you do, and if you force the max size, you will see your benchmark return the same throughput as not setting the option at all and allowing it to grow dynamically to the same size.

Why does setting SO_SNDBUF and SO_RCVBUF destroy performance?

Question

1 answers

solution1
1 ACCPTED 2021-04-15 21:29:40

Why does setting SO_SNDBUF and SO_RCVBUF destroy performance?

Question

1 answers

solution1 1 ACCPTED 2021-04-15 21:29:40

solution1
1 ACCPTED 2021-04-15 21:29:40