简体繁体 English

TCP套接字发送缓冲区大小的效率

[英]TCP Sockets send buffer size efficiency

原文 2015-02-28 18:44:54 8 2 linux/ windows/ sockets/ tcp/ network-programming

When working with WinSock or POSIX TCP sockets (in C/C++, so no extra Java/Python/etc. wrapping), is there any efficiency pro/cons to building up a larger buffer (eg say upto 4KB) in user space then making as few calls to send as possible to send that buffer vs making multiple smaller calls directly with the bits of data (say 1-1000 bytes), other the the fact that for non-blocking/asynchronous sockets the single buffer is potentially easier for me to manage. 当使用WinSock或POSIX TCP套接字（在C / C ++中，因此不需要额外的Java / Python /等包装）时，在用户空间中建立更大的缓冲区（例如，最大4KB）是否有任何效率方面的利弊，然后发送尽可能少的调用来发送该缓冲区与直接使用数据位（例如1-1000字节）进行多个较小的调用，其他事实是，对于非阻塞/异步套接字，单个缓冲区对我来说可能更容易管理。

I know with recv small buffers are not recommended, but I couldn't find anything for sending. 我知道不建议使用recv小缓冲区，但找不到任何要发送的东西。

eg does each send call on common platforms go to into kernel mode? 例如，每个在通用平台上的发送呼叫都进入内核模式吗？ Could a 1 byte send actually result in a 1 byte packet being transmitted under normal conditions? 1字节发送是否会导致正常情况下发送1字节数据包？

2 个解决方案

As explained on TCP Illustrated Vol I , by Richard Stevens, TCP divides the send buffer in near to optimum segments to fit in the maximum packet size along the path to the other TCP peer. 正如Richard Stevens在TCP Illustrated Vol I上所解释的那样，TCP将发送缓冲区划分为接近最佳段的长度，以适应沿着通往另一个TCP对等方的路径的最大数据包大小。 That means that it will never try to send segments that will be fragmented by ip along the route to destination (when a packet is fragmented at some ip router, it sends back an IP fragmentation ICMP packet and TCP will take it into account to reduce the MSS for this connection). 这意味着它将永远不会尝试将被ip分割的段沿着到目的地的路由发送（当数据包在某个ip路由器上被分割时，它会发回IP碎片ICMP数据包，而TCP将考虑它以减少此连接的MSS）。 That said, there is no need for larger buffer than the maximum packet size of the link level interfaces you'll have along the path. 就是说，不需要更大的缓冲区，就不会比路径上的链路级接口的最大包大小大。 Having one, let's say, twice or thrice longer, makes you sure that TCP will not stop sending as soon as it receives some acknowledge of remote peer, because of not having its buffer filled with data. 假设有一个或两次或更长时间，则可以确保TCP不会在收到对远程对等方的某些确认后立即停止发送，因为它的缓冲区中没有数据。

Think that the normal interface type is ethernet and it has a maximum packet size of 1500 bytes, so normally TCP doesn't send a segment greater than this size. 认为普通接口类型是以太网，并且最大数据包大小为1500字节，因此通常TCP不会发送大于此大小的段。 And it normally has an internall buffer of 8Kb per connection, so there's little sense in adding buffer size at kernel space for that (if this is the only reason to have a buffer in kernel space ). 而且它通常每个连接都有一个8Kb的内部缓冲区，因此在内核空间中增加缓冲区大小没有什么意义（如果这是在内核空间中具有缓冲区的唯一原因）。

Of course, there are other factors that force you to use a buffer in user space (for example, you want to store the data to send to your peer process somewhere, as there's only 8Kb data in kernel space to buffer, and you will need more space to be able to do some other processes) An example: ircd (the Internet Relay Chat daemon) uses write buffers of up to 100Kb before dropping a connection because the other side is not receiving/acknowledging that data. 当然，还有其他一些因素迫使您在用户空间中使用缓冲区（例如，您要存储数据以发送到对等进程的某个位置，因为内核空间中只有8Kb数据需要缓冲，因此您需要有更多空间可以执行其他一些进程）例如：ircd（Internet中继聊天守护程序）在断开连接之前使用了高达100Kb的写缓冲区，因为另一侧没有接收/确认该数据。 If you only write(2) to the connection, you'll be put on wait once the kernel buffer is full, and perhaps that's not what you want. 如果仅对连接写入（2） ，则一旦内核缓冲区已满，您将被等待，也许这不是您想要的。

The reason to have buffers in user space is because TCP makes also flow control, so when it's not able to send data, it has to be put somewhere to cope with it. 在用户空间中具有缓冲区的原因是因为TCP也进行流控制，因此当它无法发送数据时，必须将其放置在某个地方以对其进行处理。 You'll have to decide if you need your process to save that data up to a limit or you can block sending data until the receiver is able to receive again. 您必须决定是否需要将数据保存到最大限制的过程，或者可以阻止发送数据，直到接收者可以再次接收为止。 The buffer size in kernel space is limited and normally out of control for the user/developer. 内核空间中的缓冲区大小是有限的，通常对于用户/开发人员来说是无法控制的。 Buffer size in user space is limited only by the resources allowable to it. 用户空间中的缓冲区大小仅受其允许的资源限制。

Receiving/sending small chunks of data in a TCP connection is not recommendable because of the increased overhead of TCP handshaking and headers impose. 不建议在TCP连接中接收/发送小块数据，因为TCP握手和标头会增加开销。 Suppose a telnet connection in which for each character sent, a header for TCP and other for IP is added (20 bytes min for TCP, 20 bytes min for IP, 14 bytes for ethernet frame and 4 for the ethernet CRC) makes up to 60 bytes+ to transmit only one character. 假设一个telnet连接 ，其中为每个发送的字符添加一个TCP报头和另一个IP报头（TCP至少20个字节，IP至少20个字节，以太网帧至少14个字节，以太网CRC 4个），总共为60字节+仅发送一个字符。 And normally each tcp segment is acknowledged individually, so that makes a full roundtrip time to send a segment and get the acknowledge (just to be able to free the buffer resources and assume this character as transmitted) 而且通常每个tcp段都是单独确认的，因此要花费一个完整的往返时间来发送一个段并获得确认（只是为了释放缓冲区资源并假定此字符已被传输）

So, finally, what's the limit? 那么，最后有什么限制？ It depends on your application. 这取决于您的应用程序。 If you can cope with the kernel resources available and don't need more buffers, you can pass without havin buffers in user space. 如果您可以应付可用的内核资源并且不需要更多的缓冲区，则可以在用户空间中没有havin缓冲区的情况下进行传递。 If you need more, you'll need to implement buffers and be able to feed the kernel buffer with your buffer data when available. 如果您需要更多，则需要实现缓冲区并能够在可用时将缓冲区数据与内核缓冲区一起提供。

Yes, a one byte send can - under very normal conditions - result in sending a TCP packet with only a single byte payload. 是的，一个字节的send可以-非常正常的条件下-导致发送TCP数据包只有一个字节的有效载荷。 Send coalescing in TCP is normally done by use of Nagle's algorithm . TCP中的发送合并通常是使用Nagle算法完成的。 With Nagle's algorithm, sending data is delayed iff there is data that has already been sent but not yet acknowledged. 使用Nagle的算法，如果存在已发送但尚未确认的数据，则延迟发送数据。

Conversely data will be sent immediately if there is no unacknowledged data. 相反，如果没有未确认的数据，将立即发送数据。 Which is usually true in the following situations: 在以下情况下通常是这样：

The connection has just been opened 连接刚刚打开
The connection has been idle for some time 连接已闲置了一段时间
The connection only received data but nothing was sent for some time 连接仅接收数据，但一段时间未发送任何内容

In that case the first send call that your application performs will cause a packet to be sent immediately, no matter how small. 在这种情况下，您的应用程序执行的第一个send调用将导致立即发送一个数据包，无论它的大小如何。 So starting communication with two or more small send s is usually a bad idea because it increases overhead and delay. 因此，使用两个或多个小send通信通常是一个坏主意，因为这会增加开销和延迟。

The infamous "send send recv" pattern can also cause really large delays (eg on Windows typically 200ms). 臭名昭著的“发送发送接收”模式也可能导致很大的延迟（例如，在Windows上通常为200ms）。 This happens if the local TCP stack uses Nagle's algorithm (which will usually delay the second send) and the remote stack uses delayed acknowledgment (which can delay the acknowledgment of the first packet). 如果本地TCP堆栈使用Nagle算法（通常会延迟第二个发送），而远程堆栈使用延迟的确认（这可能会延迟第一个数据包的确认），则会发生这种情况。

Since most TCP stack implementations use both, Nagle's algorithm and delayed acknowledgment, this pattern should best be avoided. 由于大多数TCP堆栈实现都同时使用Nagle的算法和延迟的确认，因此最好避免这种模式。