简体   繁体   English

如果服务器关闭,则阻止recv调用挂起

[英]Blocking recv call hangs if server is down

Another socket problem. 另一个套接字问题。

In my client code, I am sending some packet and expectign some response from the server side: 在我的客户端代码中,我正在发送一些数据包并期望从服务器端得到一些响应:


send() 发送()

recv() <-- it is blocking recv()<-正在阻止

Immediately after send(), the server crashes and rebooted itself. send()之后,服务器立即崩溃并重新启动。 In the meantime the recv() is waiting. 同时,recv()正在等待。 But even after the server is up, the receive call is hanging. 但是即使服务器启动后,接收呼叫仍会挂起。 I have added SIGPIPE signal handling but its still not able to recognize that the socket is broken. 我添加了SIGPIPE信号处理,但是它仍然无法识别套接字已损坏。

When i cancel the operation, i got the error from recv() that interrupt has been issued. 当我取消操作时,我从recv()收到了已发出中断的错误消息。

Anyone could help me how to rectify this error? 有人可以帮助我如何纠正此错误?

This is in a shared library running on Solaris machine. 它位于Solaris计算机上运行的共享库中。

May be you should set a timeout delay in order to manage this case. 可能是您应该设置超时延迟以管理这种情况。 It can easily done by using setsockopt and setting SO_RECVTIMEO flag on your socket: 通过使用setsockopt并在套接字上设置SO_RECVTIMEO标志,可以轻松完成此操作:

  struct timeval tv;
  tv.tv_sec = 30;
  tv.tv_usec = 0;
  if (setsockopt(socket_fd, SOL_SOCKET, SO_RCVTIMEO, (char *)&tv,  sizeof tv))
  {
    perror("setsockopt");
    return -1;
  }

Another possibility is to use non blocking sockets and manage read/write stuff with poll(2) or select(2). 另一种可能性是使用非阻塞套接字,并使用poll(2)或select(2)管理读/写内容。 You should take a look on Beej's Guide to Network Programming . 您应该看一下Beej的《网络编程指南》

As others have mentioned, you can use select() to set a time limit for the socket to become readable. 正如其他人提到的,您可以使用select()设置套接字可读的时间限制。

By default, the socket will become readable when there's one or more bytes available in the socket receive buffer. 默认情况下,当套接字接收缓冲区中有一个或多个字节可用时,套接字将变得可读。 I say "by default" because this amount is tunable by setting the socket receive buffer "low water mark" using the SO_RCVLOWAT socket option. 我说“默认”是因为通过使用SO_RCVLOWAT套接字选项将套接字接收缓冲区设置为“低水位标记”可以调整此数量。

Below is a function you can use to determine if the socket is ready to be read within a specified time limit. 您可以使用以下函数来确定套接字是否已准备好在指定的时限内读取。 It will return 1 if the socket has data available for reading. 如果套接字具有可读取的数据,它将返回1。 Otherwise, it will return 0 if it times out. 否则,超时将返回0。

The code is based on an example from the book Unix Network Programming (www.unpbook.com) that can provide you with more information. 该代码基于Unix网络编程(www.unpbook.com)一书中的示例,该书可以为您提供更多信息。

/* Wait for "timeout" seconds for the socket to become readable */
readable_timeout(int sock, int timeout)
{
    struct timeval tv;
    fd_set         rset;
    int            isready;

    FD_ZERO(&rset);
    FD_SET(sock, &rset);

    tv.tv_sec  = timeout;
    tv.tv_usec = 0;

 again:
    isready = select(sock+1, &rset, NULL, NULL, &tv);
    if (isready < 0) {
        if (errno == EINTR) goto again;
        perror("select"); _exit(1);
    }

    return isready;
}

Use it like this: 像这样使用它:

if (readable_timeout(sock, 5/*timeout*/)) {
    recv(sock, ...)

You mention handling SIGPIPE on the client side which is separate issue. 您提到了在客户端处理SIGPIPE的问题。 If you are getting this is means your client is writing to the socket, even after having received a RST from the server. 如果您得到此消息,则意味着即使从服务器接收到RST,客户端也正在向套接字写入。 That is a separate issue from having a problem with a blocking call to recv(). 这与对recv()的阻塞调用有问题是一个独立的问题。

The way that could arise is that the server crashes and reboots, losing its TCP state. 可能出现的方式是服务器崩溃并重新启动,从而丢失其TCP状态。 Your client sends data to the server which sends back a RST, since it no longer has state for the connection. 您的客户端将数据发送到服务器,该服务器发送回RST,因为它不再具有连接状态。 Your client ignores the RST and tries to send more data and it's this second send() which causes your program to receive the SIGPIPE signal. 您的客户端将忽略RST并尝试发送更多数据,而正是第二个send()导致您的程序接收到SIGPIPE信号。

What error were you getting from the call to recv()? 从调用recv()遇到什么错误?

The problem is that the connection is never actually closed. 问题是该连接从未真正关闭过。 (No FIN packages are sent etc, the other end just goes away.) (没有发送FIN包,等等,另一端就消失了。)

What you want to do is set a timeout for recv'ing on the socket, using setsockopt(3) with SO_RCVTIMEO as option_name. 您要执行的操作是,使用带有SO_RCVTIMEO作为options_name的setsockopt(3) SO_RCVTIMEO套接字上的超时设置。

Another way to make the recv() call nono-blockign on Solaris is to use fcntl() to set the socket descriptor non-blocking: 在Solaris上使recv()调用nono-blockign的另一种方法是使用fcntl()将套接字描述符设置为非阻塞:

fcntl(sockDesc, F_SETFL, O_NONBLOCK); fcntl(sockDesc,F_SETFL,O_NONBLOCK);

This can be used in along with select() to protect your recv() from faulty select() return value (in case if select() returns positive and there is no data on the socket). 这可以与select()一起使用,以保护recv()免受错误的select()返回值的影响(如果select()返回正且套接字上没有数据)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM