简体繁体 English

如何在Java中立即识别出断开的套接字连接？

[英]How to identify a broken socket connection in Java immediately?

原文 2011-11-04 10:37:35 2 4 java/ sockets

I have a typical java client and a server. 我有一个典型的Java客户端和服务器。 The client sends some request to the server and waits for the response. 客户端向服务器发送一些请求并等待响应。 The client reads up to say 100 bytes of data from the contained input stream into an array of bytes. 客户端从包含的输入流中读取100个字节的数据到一个字节数组中。 It waits for the complete response of 100 bytes to be read within a specified timeout period of say 3 secs. 它等待在指定的超时时间（例如3秒）内读取100字节的完整响应。 The problem here is to identify if the server went down or crashed while/before writing the response. 这里的问题是在写入响应时/之前确定服务器是否发生故障或崩溃。 Basically, we need to identify if the socket was broken or the peer disconnected for some reason. 基本上，我们需要确定套接字是否因为某些原因而断开或者对等端断开连接。 Is there a way to identify this? 有没有办法识别这个？

4 个解决方案

How to identify a broken socket connection in Java immediately? 如何在Java中立即识别出断开的套接字连接？

You can't detect it immediately, in Java or any other language. 您无法使用Java或任何其他语言立即检测到它。 TCP/IP doesn't know, so Java can't know. TCP / IP不知道，所以Java无法知道。 The only sure way to detect a broken TCP connection is by writing to it and catching IOExceptions, and they won't happen immediately. 检测TCP连接损坏的唯一可靠方法是写入并捕获IOExceptions，它们不会立即发生。

The best way to identity the connection is down is to timeout the connection. 标识连接的最佳方法是关闭连接。 ie you expect a response in a given amount of time and flag if that response does not come as you expect. 即，您希望在给定的时间内得到响应，并标记该响应是否未按预期进行。

When you have a graceful disconnection (.eg the other end calls close()) the read on the connection will let you know once the buffer has been drained. 如果你有一个正常的断开连接（例如，另一端调用close（）），一旦缓冲区被耗尽，连接上的读数就会通知你。

However, if there some other type of failure, you might not be notified until the OS times out the connection (eg after 3 minutes) and indeed, you may want to keep the connection. 但是，如果出现其他类型的故障，则在操作系统超时（例如3分钟后）之前可能不会收到通知，实际上，您可能希望保持连接。 eg if you pull the network cable out for 10 seconds and put it back in, that doesn't need to be a failure. 例如，如果您将网络电缆拉出10秒钟并将其重新插入，则无需失败。

EDIT: I don't believe its a good idea to be too aggressive in automatically handling connection/service "failures". 编辑：我认为在自动处理连接/服务“失败”方面过于激进是一个好主意。 This is usually better handled by a planned fix to the system, based on investigation of the true cause. 基于对真实原因的调查，通常可以通过对系统的计划修复来更好地处理。 eg increased bandwidth, redundant connectivity, faster servers, code fixes. 例如，增加带宽，冗余连接，更快的服务器，代码修复。

If connection is broken abnormally, you will receieve IOException when reading; 如果连接异常中断，读取时会收到IOException ; that normally happens quite fast, but there is no guarantees about time - all depends on the OS, network hardware, etc. If remote end gracefully closes the socket, you'll read -1 as next byte. 这通常发生得非常快，但无法保证时间 - 所有这些都取决于操作系统，网络硬件等。如果远程端正常关闭套接字，您将读取-1作为下一个字节。

Assuming everything else works, if the remote peer - the TCP server - was killed then the TCP client will normally receive a TCP RST (reset) and you'll get an IOException in your client application. 假设其他一切正常，如果远程对等方 - TCP服务器 - 被杀死，那么TCP客户端通常会收到TCP RST（重置），并且您将在客户端应用程序中获得IOException 。

However, there are lots of other things that can go wrong besides a process being killed. 但是，除了正在被杀死的进程之外，还有许多其他问题可能会出错。 Basically anything on the network path between the two processes: a cable is yanked, a router dies, a firewall dies, etc. All of this will not immediately be detected. 基本上两个进程之间的网络路径上的任何东西：电缆被拉扯，路由器死亡，防火墙死亡等等。所有这些都不会立即被检测到。

For the above reasons the general rule is - as pointed out in the answer from EJP - that a broken connection can only be detected by writing to it . 由于上述原因，一般规则是 - 正如EJP的答案所指出的那样 - 断开的连接只能通过写入来检测 。 This is why it is always recommended that a TCP client and TCP server exchange some type of heartbeat messages at regular intervals. 这就是为什么始终建议TCP客户端和TCP服务器定期交换某种类型的心跳消息。 There are different ways to do this. 有不同的方法来做到这一点。 I like best the method where the TCP client will - in the absence of data being received from the TCP server - send a heartbeat message to the server and expect a reply back within a certain time period. 我最喜欢TCP客户端的方法 - 在没有从TCP服务器接收数据的情况下 - 向服务器发送心跳消息并期望在特定时间段内回复。 This way heartbeat messages will only be sent when really needed. 这样，只有在真正需要时才会发送心跳消息。

A sub-optimal approach - if you cannot implement true heartbeating - is to always read with a timeout. 一种次优的方法 - 如果你不能实现真正的心跳 - 就是总是以超时读取。 Set the timeout on the socket and then catch java.net.SocketTimeoutException . 在套接字上设置超时，然后捕获java.net.SocketTimeoutException 。 This will allow you to know that no data has been received on socket during x milliseconds. 这将使您知道在x毫秒内没有在套接字上收到任何数据。

It should be mentioned that there's one scenario where you don't have to use heartbeating, nor using the socket timeout: if the TCP client and the TCP server communicate over a loopback interface then a broken connection will always be propagated to both the TCP client application and the TCP server application. 应该提到的是，有一种情况是您不必使用心跳，也不必使用套接字超时：如果TCP客户端和TCP服务器通过环回接口进行通信，则断开的连接将始终传播到TCP客户端应用程序和TCP服务器应用程序。 This is because, in this case, there's really no network infrastructure between the two processes. 这是因为，在这种情况下，两个进程之间实际上没有网络基础结构。 So if you have an existing application which isn't well-designed with respect to its TCP communication (ie it doesn't implement some form of heartbeating or at least reading with a timeout), then as a last resort you may 'fix' the problem by moving the two application onto the same host and let them communicate over the loopback interface. 因此，如果您的现有应用程序在其TCP通信方面没有很好的设计（即它没有实现某种形式的心跳或至少读取超时），那么作为最后的手段，您可以“修复”通过将两个应用程序移动到同一主机上并让它们通过环回接口进行通信来解决问题。