简体   繁体   中英

Java IOException: No buffer space available while sending UDP packets on Linux

I have a third party component which tries to send too many UDP messages to too many separate addresses in a certain situation. This is a burst which happens when the software is started and the situation is temporary. I'm actually not sure is it the plain amount of the messages or the fact that each of them go to a separate IP address.

Anyway, changing the underlying protocol or the problematic component is not an option, so I'm looking for a workaround. The StackTrace looks like this:

java.io.IOException: No buffer space available
    at java.net.PlainDatagramSocketImpl.send(Native Method)
    at java.net.DatagramSocket.send(DatagramSocket.java:612)

This issue occurs (at least) with Java versions 1.6.0_13 and 1.6.0_10 and Linux versions Ubuntu 9.04 and RHEL 4.6.

Are there any Java system properties or Linux configuration tweaks which might help?

I've finally determined what the issue is. The Java IOException is misleading since it is "No buffer space available" but the root issue is that the local ARP table has been filled. On Linux, the default ARP table lookup is 1024 (files /proc/sys/net/ipv4/neigh/default/gc_thresh1, /proc/sys/net/ipv4/neigh/default/gc_thresh2, /proc/sys/net/ipv4/neigh/default/gc_thresh3).

What was happening in my case (and I assume your case), is that your Java code is sending out UDP packets from an IP address that is in the same subnet as your destination addresses. When this is the case, the Linux machine will perform an ARP lookup to translate the IP address into the hardware MAC address. Since you are blasting out packets to many different IPs the local ARP table fills up quickly, hits 1024, and that is when the Java exception is thrown.

The solution is simple, either increase the limit by editing the files I mentioned earlier, or move your server into a different subnet than your destination addresses, which then causes the Linux box to no longer perform neighbor ARP lookups (instead will be handled by a router on the network).

When sending lots of messages, especially over gigabit ethernet in Linux, the stock parameters for your kernel are usually not optimal. You can increase the Linux kernel buffer size for networking through:

echo 1048576 > /proc/sys/net/core/wmem_max
echo 1048576 > /proc/sys/net/core/wmem_default
echo 1048576 > /proc/sys/net/core/rmem_max
echo 1048576 > /proc/sys/net/core/rmem_default

As root.

Or use sysctl

sysctl -w net.core.rmem_max=8388608 

There are tons of network options

See Linux Network Tuning by IBM and More tuning information

Might be a bit complicated but as I know, Java uses the SPI 1 pattern for the network sub-library. This allows you to change the implementation used for various network operations. If you use OpenJDK then you could gain some hints how and what to wrap with your implementation. Then, in your implementation you slow down the I/O with some sleeps for example.

Or, just for fun, you could override the default DatagramSocket with your modified implementation. Have the same package name for it and - as I know - it will take precedence over the default JRE class. At least this method worked for me on some buggy 3rd party library.

Edit:

1 Service Provider Interface is a method to separate client and service code within an API. This separation allows different client and different provider implementations. Can be recognized from the name ending in Impl usually, just like in your stack trace java.net.PlainDatagramSocketImpl is the provider implementation where the DatagramSocket is the client side API.

You commented that you don't want to slow down the communication the entire way. There are several hacks to avoid it, for example measure the time in your code and slow the communication within the first 1-2 minutes starting at your first incoming method call. Then you can skip the sleep.

Another option would be to identify the misbehaving class in the library, JAD it and fix it. Then replace the original class file in the library.

当我尝试使用WIFI连接到数据库在两个本地JVM中运行coherence群集时,我遇到了此错误。如果我使用以太网运行它 - 它运行良好。

I'm also currently seeing this problem as well with both Debian & RHEL. At this point I believe I've isolated it down to the NIC and/or the NIC driver. What hardware configuration do you have this also exhibits this problem? This seems to only occur on new Dell PowerEdge servers that we recently acquired that have Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet NICs.

I too can confirm that it is the rapid generation of outbound UDP packets to many different IP addresses in a short window. I've attempted to write a simple Java application that can reproduce it (since ours is occurring with snmp4j).

EDIT

Look at my answer here: Java IOException: No buffer space available while sending UDP packets on Linux

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM