简体繁体 English

Java TCP / IP套接字性能问题

[英]Java TCP/IP Socket Performance Problem

原文 2011-03-14 18:15:06 5 4 java/ sockets/ tcp/ nio/ low-latency

Our application is reading data very fast over TCP/IP sockets in Java. 我们的应用程序是通过Java中的TCP / IP套接字非常快速地读取数据。 We are using the NIO library with a non-blocking Sockets and a Selector to indicate readiness to read. 我们正在使用带有非阻塞套接字和选择器的NIO库来指示读取的准备情况。 On average, the overall processing times for reading and handling the read data is sub-millisecond. 平均而言，读取和处理读取数据的总处理时间是亚毫秒。 However we frequently see spikes of 10-20 milliseconds. 然而，我们经常看到10-20毫秒的尖峰。 (running on Linux). （在Linux上运行）。

Using tcpdump we can see the time difference between tcpdump's reading of 2 discreet messages, and compare that with our applications time. 使用tcpdump，我们可以看到tcpdump读取2条谨慎消息之间的时差，并将其与我们的应用程序时间进行比较。 We see tcpdump seems to have no delay, whereas the application can show 20 milliseconds. 我们看到tcpdump似乎没有延迟，而应用程序可以显示20毫秒。

We are pretty sure this is not GC, because the GC log shows virtually no Full GC, and in JDK 6 (from what I understand) the default GC is parallel, so it should not be pausing the application threads (unless doing Full GC). 我们非常确定这不是GC，因为GC日志几乎没有显示Full GC，而且在JDK 6中（根据我的理解），默认GC是并行的，所以它不应该暂停应用程序线程（除非执行Full GC）。

It looks almost as if there is some delay for Java's Selector.select(0) method to return the readiness to read, because at the TCP layer, the data is already available to be read (and tcpdump is reading it). 它看起来几乎像Java的Selector.select(0)方法有一些延迟返回准备读取，因为在TCP层，数据已经可以读取（并且tcpdump正在读取它）。

Additional Info: at peak load we are processing about 6,000 x 150 bytes avg per message, or about 900 MB per second. 附加信息：在峰值负载时，我们每条消息处理大约6,000 x 150字节平均值，或每秒大约900 MB。

4 个解决方案

eden集合仍然会产生STW暂停，因此20ms可能完全正常，具体取决于实时集的分配行为和堆大小/大小。

Is your Java code running under RTLinux, or some other distro with hard real-time scheduling capability? 您的Java代码是在RTLinux下运行，还是其他一些具有硬实时调度功能的发行版？ If not, 10-20 msec of jitter in the processing times seems completely reasonable, and expected. 如果没有，处理时间内10-20毫秒的抖动似乎是完全合理的，并且是预期的。

I had the same problem in a java service that I work on. 我在我工作的java服务中遇到了同样的问题。 When sending the same request repeatedly from the client the server would block at the same spot in stream for 25-35ms. 当从客户端重复发送相同的请求时，服务器将在流中的相同位置阻塞25-35ms。 Turning off Nagle's algorithm in the socket fixed this for me. 在套接字中关闭Nagle的算法为我修复了这个问题。 This can be accomplished by calling setTcpNoDelay(true) on the Socket. 这可以通过在Socket上调用setTcpNoDelay（true）来完成。 This may result in increased network congestion because ACKs will now be sent as separate packets. 这可能导致网络拥塞增加，因为ACK现在将作为单独的数据包发送。 See http://en.wikipedia.org/wiki/Nagle%27s_algorithm for more info on Nagle's algorithm. 有关Nagle算法的更多信息，请参见http://en.wikipedia.org/wiki/Nagle%27s_algorithm 。

From the tcpdump faq : 从tcpdump faq ：

WHEN IS A PACKET TIME-STAMPED? 什么时候是一个时间戳？ HOW ACCURATE ARE THE TIME STAMPS? 时间准确度如何准确？

In most OSes on which tcpdump and libpcap run, the packet is time stamped as part of the process of the network interface's device driver, or the networking stack, handling it. 在大多数运行tcpdump和libpcap的操作系统中，数据包都带有时间戳，作为网络接口设备驱动程序或网络堆栈处理过程的一部分。 This means that the packet is not time stamped at the instant that it arrives at the network interface; 这意味着数据包在到达网络接口时没有加盖时间戳; after the packet arrives at the network interface, there will be a delay until an interrupt is delivered or the network interface is polled (ie, the network interface might not interrupt the host immediately - the driver may be set up to poll the interface if network traffic is heavy, to reduce the number of interrupts and process more packets per interrupt), and there will be a further delay between the point at which the interrupt starts being processed and the time stamp is generated. 在数据包到达网络接口之后，将会有一个延迟，直到中断被传递或网络接口被轮询（即，网络接口可能不立即中断主机 - 如果网络可以设置驱动程序轮询接口流量很大，以减少中断次数并在每次中断时处理更多数据包），并且在中断开始处理的时间点和生成时间戳之间会有进一步的延迟。

So odds are, the timestamp is made in the privileged kernel layer, and the lost 20ms is to context-switching overhead back to user-space and into Java and the JVMs network selector logic. 很可能，时间戳是在特权内核层中进行的，丢失的20ms是将上下文切换回用户空间以及Java和JVM网络选择器逻辑。 Without more analysis of the system as a whole I don't think it's possible to make an affirmative selection of cause. 如果不对整个系统进行更多分析，我认为不可能做出肯定的原因选择。