[英]Very odd socket behavior in Java; not always closing ports?
Over the course of development of a significantly large project, we've accumulated a lot of unit tests. 在一个大型项目的开发过程中,我们已经积累了大量的单元测试。 A lot of these tests start servers, connect to these servers and close the servers and clients, usually in the same process.
许多这些测试通常在同一个进程中启动服务器,连接到这些服务器并关闭服务器和客户端。
However, these tests randomly fail with a "Failed to bind address 127.0.0.1:(port)". 但是,这些测试随机失败,并显示“无法绑定地址127.0.0.1 :( port)”。 When the test is re-run, the error usually disappears.
重新运行测试时,错误通常会消失。
Now, we thought this was a problem with our tests, but we decided to write a small test in Clojure, which I'll post below (and comment for the non-Clojure people). 现在,我们认为这是我们测试的问题,但我们决定在Clojure中编写一个小测试,我将在下面发布(并评论非Clojure人员)。
(ns test
(:import [java.net Socket ServerSocket]))
(dotimes [n 10000] ; Run the test ten thousand times
(let [server (ServerSocket. 10000) ; Start a server on port 10000
client (Socket. "localhost" 10000) ; Start a client on port 10000
p (.getLocalPort client)] ; Get the local port of the client
(.close client) ; Close the client
(.close server) ; Close the server
(println "n = " n) ; Debug
(println "p = " p) ; Debug
(println "client = " client) ; Debug
(println "server = " server) ; Debug
(let [server (ServerSocket. p)] ; Start a server on the local port of the client we just closed
(.close server) ; Close the server
(println "client = " client) ; Debug
(println "server = " server) ; Debug
))
)
The exception appears, at random, on the line where we start the second server. 在我们启动第二台服务器的行上随机出现该异常。 It appears that Java is holding onto the local port - even though the client on that port has already been closed.
似乎Java正在保留本地端口 - 即使该端口上的客户端已经关闭。
So, my question: Why on earth is Java doing this, and why is it so seemingly random? 所以,我的问题:为什么Java真的这样做,为什么它看似随机?
EDIT : Someone suggested I set the socket's reuseAddr to true. 编辑 :有人建议我将套接字的reuseAddr设置为true。 I've done this, and nothing has changed, so here's the code below.
我做到了这一点,没有任何改变,所以这是下面的代码。
(ns test
(:import [java.net Socket ServerSocket InetSocketAddress]))
(dotimes [n 10000] ; Run the test ten thousand times
(let [server (ServerSocket. )] ; Create a server socket
(. server (setReuseAddress true)) ; Set the socket to reuse address
(. server (bind (InetSocketAddress. 10000))) ; Bind the socket
(let [client (Socket. "localhost" 10000) ; Start a client on port 10000
p (.getLocalPort client)] ; Get the client's local port
(.close client) ; Close the client
(.close server) ; Close the server
; (. Thread (sleep 1000)) ; A sleep for testing
(println "n = " n) ; Debug
(println "p = " p) ; Debug
(println "client = " client) ; Debug
(println "server = " server) ; Debug
(let [server (ServerSocket. )] ; Create a server socket
(. server (setReuseAddress true)) ; Set the socket to reuse address
(. server (bind (InetSocketAddress. p))) ; Bind the socket to the local port of the client we just had
(.close server) ; Close the server
(println "client = " client) ; Debug
(println "server = " server) ; Debug
)))
)
I've also noticed that a sleep of 10msec or even 100msec does not prevent the problem. 我还注意到10毫秒甚至100毫秒的睡眠并不能解决这个问题。 1000msec has (so far) managed to prevent it, however.
然而,1000毫秒(到目前为止)已设法阻止它。
EDIT 2 : Someone put me on to SO_LINGER - but I can't find a way to set that on the ServerSockets. 编辑2 :有人把我放到了SO_LINGER上 - 但我找不到在ServerSockets上设置它的方法。 Anyone have any ideas on that?
有人有任何想法吗?
EDIT 3 : Turns out that SO_LINGER is disabled by default. 编辑3 :默认情况下,SO_LINGER被禁用。 What else can we look at?
我们还能看到什么?
UPDATE : The problem has been solved for the most part, using dynamic port allocation over a range of 10,000 or so ports. 更新 :问题已经解决了大部分,使用10,000个左右端口的动态端口分配。 However, I'd still like to see what people can come up with.
但是,我仍然希望看到人们可以想出什么。
I'm not (too) with the Clojure syntax, but you should invoke socket.setReuseAddr(true)
. 我不是(也)使用Clojure语法,但你应该调用
socket.setReuseAddr(true)
。 This allows the program to reuse the port, even if there may be sockets in the TIME_WAIT state. 这允许程序重用端口,即使可能存在TIME_WAIT状态的套接字。
The test itself is invalid. 测试本身无效。 Testing this behaviour is pointless, and has nothing to do with any required application behaviour: it is just exercising a corner condition in the TCP stack, which certainly no application should try to rely on.
测试这种行为毫无意义,并且与任何所需的应用程序行为无关:它只是在TCP堆栈中运行一个角落条件,当然没有应用程序应该尝试依赖它。 I would expect that opening a listening socket on a port that had just been an outbound connected port would never succeed at all due to TIME_WAIT, or at best succeed half the time due to uncertainty as to which end issued the close first.
我希望在一个刚刚成为出站连接端口的端口上打开一个侦听套接字永远不会因为TIME_WAIT而成功,或者由于关于哪一端发出关闭优先级的不确定性,最多只能成功一半。
I would remove the test. 我会删除测试。 The rest of it doesn't do anything useful either,
其余部分也没有做任何有用的事情,
You might try setReuseAddress(true)
on the server sockets. 您可以在服务器套接字上尝试
setReuseAddress(true)
。
If another socket on the same port is in the TIME_WAIT state after closing, this flag will allow the socket to bind to the port anyway. 如果同一端口上的另一个套接字在关闭后处于TIME_WAIT状态,则此标志将允许套接字绑定到该端口。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.