简体   繁体   English

从不同的线程中的相同套接字发送和接收不起作用

[英]send and recv on same socket from different threads not working

I read that it should be safe from different threads concurrently, but my program has some weird behaviour and I don't know what's wrong. 我读到它应该同时安全地从不同的线程,但我的程序有一些奇怪的行为,我不知道什么是错的。

I have concurrent threads communicating with a client socket 我有并发线程与客户端套接字通信

  1. one doing send to a socket 一个人发送到套接字
  2. one doing select and then recv from the same socket 一个做select,然后从同一个套接字recv

As I'm still sending, the client has already received the data and closed the socket. 正如我仍在发送,客户端已经收到数据并关闭了套接字。 At the same time, I'm doing a select and recv on that socket, which returns 0 (since it is closed) so I close this socket. 同时,我正在对该套接字执行select和recv,它返回0(因为它已关闭)所以我关闭了这个套接字。 However, the send has not returned yet...and since I call close on this socket the send call fails with EBADF. 但是,发送还没有返回...因为我在这个套接字上调用close,所以发送调用因EBADF而失败。

I know the client has received the data correctly since I output it after I close the socket and it is right. 我知道客户端已正确接收数据,因为我在关闭套接字后输出它是正确的。 However, on my end, my send call is still returning an error (EBADF), so I want to fix it so it doesn't fail. 但是,在我的结尾,我的发送调用仍然返回错误(EBADF),所以我想修复它,以便它不会失败。

This doesn't always happen. 这并不总是发生。 It happens maybe 40% of the time. 它可能发生在40%的时间。 I don't use sleep anywhere. 我不在任何地方睡觉。 Am I supposed to have pauses between sends or recvs or anything? 我应该在发送或收发之间暂停吗?

Here's some code: 这是一些代码:

Sending: 发送:

while(true)
{
    // keep sending until send returns 0
    n = send(_sfd, bytesPtr, sentSize, 0);

    if (n == 0)
    {
        break;
    }
    else if(n<0)
    {
        cerr << "ERROR: send returned an error "<<errno<< endl; // this case is triggered
        return n;
    }

    sentSize -= n;
    bytesPtr += n;
}

Receiving: 接收:

 while(true)
{
    memset(bufferPointer,0,sizeLeft);
    n = recv(_sfd,bufferPointer,sizeLeft, 0);
    if (debug) cerr << "Receiving..."<<sizeLeft<<endl;
    if(n == 0)
    {
        cerr << "Connection closed"<<endl; // this case is triggered
        return n;
    }
    else if (n < 0)
    {
        cerr << "ERROR reading from socket"<<endl;
        return n;
    }
     bufferPointer += n;
     sizeLeft -= n;
     if(sizeLeft <= 0) break;

}

On the client, I use the same receive code, then I call close() on the socket. 在客户端,我使用相同的接收代码,然后我在套接字上调用close()。 Then on my side, I get 0 from the receive call and also call close() on the socket Then my send fails. 然后在我身边,我从接收呼叫中得到0并且还在套接字上调用close()然后我的发送失败。 It still hasn't finished?! 它还没有完成?! But my client already got the data! 但是我的客户已经获得了数据!

I must admit I'm surprised you see this problem as often as you do, but it's always a possibility when you're dealing with threads. 我必须承认,我很惊讶你经常看到这个问题,但是当你处理线程时,它总是存在的。 When you call send() you'll end up going into the kernel to append the data to the socket buffer in there, and it's therefore quite likely that there'll be a context switch, maybe to another process in the system. 当你调用send()你最终会进入内核将数据附加到那里的套接字缓冲区,因此很可能会有一个上下文切换,可能是系统中的另一个进程。 Meanwhile the kernel has probably buffered and transmitted the packet quite quickly. 同时内核可能很快就缓冲并传输了数据包。 I'm guessing you're testing on a local network, so the other end receives the data and closes the connection and sends the appropriate FIN back to your end very quickly. 我猜你正在本地网络上进行测试,所以另一端接收数据并关闭连接,并很快将适当的FIN发送回你的终端。 This could all happen while the sending machine is still running other threads or processes because the latency on a local ethernet network is so low. 这可能发生在发送机器仍在运行其他线程或进程时,因为本地以太网网络上的延迟非常低。

Now the FIN arrives - your receive thread hasn't done a lot lately since it's been waiting for input. 现在FIN到了 - 你的接收线程最近没有做很多事情,因为它一直在等待输入。 Many scheduling systems will therefore raise its priority quite a bit and there's a good chance it'll be run next (you don't specify which OS you're using but this is likely to happen on at least Linux, for example). 因此,许多调度系统将提高其优先级,并且很有可能它将在下一次运行(您没有指定您正在使用哪个操作系统,但这可能至少在Linux上发生)。 This thread closes the socket due to its zero read. 由于零读取,该线程关闭套接字。 At some point shortly after this the sending thread will be re-awoken, but presumably the kernel notices that the socket is closed before it returns from the blocked send() and returns EBADF . 在此之后的某个时刻,发送线程将被重新唤醒,但可能是内核注意到套接字在从阻塞的send()返回并返回EBADF之前已关闭。

Now this is just speculation as to the exact cause - among other things it heavily depends on your platform. 现在这只是关于确切原因的猜测 - 除其他外,它在很大程度上取决于您的平台。 But you can see how this could happen. 但是你可以看到这是怎么发生的。

The easiest solution is probably to use poll() in the sending thread as well, but wait for the socket to become write-ready instead of read-ready. 最简单的解决方案可能是在发送线程中使用poll() ,但是等待套接字变为可写入而不是读取就绪。 Obviously you also need to wait until there's any buffered data to send - how you do that depends on which thread buffers the data. 显然你还需要等到有任何缓冲数据要发送 - 你如何做到这一点取决于哪个线程缓冲数据。 The poll() call will let you detect when the connection has been closed by flagging it with POLLHUP , which you can detect before you try your send() . poll()调用将允许您通过使用POLLHUP标记它来检测连接何时关闭,您可以在尝试send()之前检测到它。

As a general rule you shouldn't close a socket until you're certain that the send buffer has been fully flushed - you can only be sure of this once the send() call has returned and indicates that all the remaining data has gone out. 作为一般规则,在确定发送缓冲区已完全刷新之前,不应关闭套接字 - 只有在send()调用返回后才能确定这一点,并指示所有剩余数据已经消失。 I've handled this in the past by checking the send buffer when I get a zero read and if it's not empty I set a "closing" flag. 我在过去通过检查发送缓冲区来处理这个,当我得到零读取时,如果它不为空,我设置一个“关闭”标志。 In your case the sending thread would then use this as a hint to do the close once everything is flushed. 在你的情况下,发送线程然后将使用它作为提示,一旦刷新一切就关闭。 This matters because if the remote end does a half-close with shutdown() then you'll get a zero read even if it might still be reading. 这很重要,因为如果远程端使用shutdown()执行半关闭,那么即使它仍然可以读取,您也将获得零读取。 You might not care about half closes, however, in which case your strategy above is OK. 您可能不关心半关闭,但在这种情况下,您的策略是正常的。

Finally, I personally would avoid the hassle of sending and receiving threads and just have a single thread which does both - that's more or less the point of select() and poll() , to allow a single thread of execution to deal with one or more filehandles without worrying about performing an operation which blocks and starves the other connections. 最后,我个人会避免发送和接收线程的麻烦,并且只有一个线程同时执行这两者 - 这或多或少是select()poll() ,以允许单个执行线程处理一个或者更多的文件句柄,而不必担心执行阻止和饿死其他连接的操作。

Found the problem. 发现了问题。 It's with my loop. 这是我的循环。 Notice that it's an infinite loop. 请注意,这是一个无限循环。 When I don't have anymore left to send, my sentSize is 0, but I'll still loop to try to send more. 当我没有剩余要发送时,我的sentSize是0,但我仍然会循环尝试发送更多。 At this time, the other thread has already closed this thread and so my send call for 0 bytes returns with an error. 此时,另一个线程已经关闭了这个线程,因此我的0字节发送调用返回错误。

I fixed it by changing the loop to stop looping when sentSize is 0 and it fixed the problem! 我通过更改循环来修复它,当sendSize为0时停止循环并解决了问题!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM