简体   繁体   English

什么可能导致非阻塞套接字在“recv”上阻塞?

[英]What could cause a non-blocking socket to block on `recv`?

I have a TCP/IP socket set to non-blocking that is blocking anyway.我有一个 TCP/IP 套接字设置为非阻塞,无论如何都是阻塞的。 The socket is only referenced in one thread.套接字仅在一个线程中被引用。 This code works on Windows (with a few call substitutions) but not on Linux.此代码适用于 Windows(有一些调用替换),但不适用于 Linux。 I have code that looks like this (Don't mind the C-style casts -- this was written long ago. Also, I trimmed it up a bit, so let me know if I accidentally trimmed off a step. Chances are that I'm actually doing that step. The actual code is on another computer, so I can't copy-paste.):我有看起来像这样的代码(不要介意 C 风格的转换——这是很久以前写的。另外,我把它修剪了一点,所以如果我不小心修剪了一步,请告诉我。很有可能是我'我实际上在做那一步。实际代码在另一台计算机上,所以我不能复制粘贴。):

// In the real code, these are class members. I'm not bonkers
int mSocket;
sockaddr_in mAddress;

void CreateSocket(
    unsigned int ipAddress,
    unsigned short port)
{        
    // Omitting my error checking in this question for brevity because everything comes back valid
    mSocket = socket(AF_INET, SOCK_STREAM, 0);  // Not -1

    int oldFlags = fctnl(mSocket, F_GETFL, 0);  // Not -1
    fcntl(mSocket, F_SETFL, oldFlags | O_NONBLOCK);  // Not -1

    mAddress.sin_family = AF_INET;
    mAddress.sin_addr.s_addr = ipAddress;  // address is valid
    mAddress.sin_port = htons((u_short)port);  // port is not 0 and allowed on firewall
    memset(mAddress.sin_zero, 0, sizeof(mAddress.sin_zero));

    // <Connect attempt loop starts here>
    connect(mSocket, (sockaddr*)&mAddress, sizeof(mAddress));  // Not -1 to exit loop
    // <Connect attempt loop ends here>
    // Connection is now successful ('connect' returned a value other than -1)
}

// ... Stuff happens ...

// ... Then this is called because 'select' call shows read data available ...
void AttemptReceive(
    MyReturnBufferTypeThatsNotImportant &returnedBytes)
{
    // Read socket
    const size_t bufferSize = 4096;
    char buffer[bufferSize];
    int result = 0;

    do {
        // Debugging code: sanity checks
        int socketFlags = fcntl(mSocket, F_GETFL, 0);  // Not -1
        printf("result=%d\n", result);
        printf("O_NONBLOCK? %d\n", socketFlags & O_NONBLOCK);  // Always prints "O_NONBLOCK? 2048"

        result = recv(mSocket, buffer, bufferSize, 0);  // NEVER -1 or 0 after hundreds to thousands of calls, then suddenly blocks

        // ... Save off and package read data into user format for output to caller ...
    } while (result == bufferSize);
}

I believe, because AttemptReceive is called in response to select, that the socket just happens to contain exactly a number of bytes equal to a multiple of the buffer size (4096).我相信,因为 AttemptReceive 被调用以响应选择,所以套接字恰好包含等于缓冲区大小(4096)倍数的字节数。 I've somewhat confirmed this with the printf statements, so it never blocks on the first loop-through.我已经用 printf 语句在一定程度上证实了这一点,所以它永远不会在第一次循环时阻塞。 Every time this bug happens, the last two lines to get printed before the thread blocks are:每次发生此错误时,在线程块之前打印的最后两行是:

result=4096
O_NONBLOCK? 2048

Changing the recv line to recv(mSocket, buffer, bufferSize, MSG_DONTWAIT);recv行更改为recv(mSocket, buffer, bufferSize, MSG_DONTWAIT); actually "fixes" the issue (suddenly, recv occasionally returns -1 with errno EWOULDBLOCK/EAGAIN (both equal to each other on my OS)), but I'm afraid I'm just putting a band-aid on a gushing wound, so to speak.实际上“修复”了这个问题(突然,recv 偶尔会返回 -1 并带有 errno EWOULDBLOCK/EAGAIN(在我的操作系统上两者相等)),但恐怕我只是在涌出的伤口上贴上创可贴,可以这么说。 Any ideas?有任何想法吗?

PS the address is "localhost", but I don't think it matters. PS地址是“localhost”,但我认为这并不重要。

Note: I'm using an old compiler (not by choice), g++ 4.4.7-23 from 2010. That may have something to do with the issue.注意:我使用的是 2010 年的旧编译器(不是自己选择的)g++ 4.4.7-23。这可能与问题有关。

socket() automatically sets O_RDWR on the socket with my operating system and compiler, but it appears that O_RDWR had accidentally gotten unset on the socket in question at the start of the program (which somehow allowed it to read fine if there was data to read, but block otherwise). socket()使用我的操作系统和编译器自动在套接字上设置O_RDWR ,但似乎O_RDWR在程序开始时意外地在有问题的套接字上未设置(如果有数据要读取,它以某种方式允许它正常读取,否则阻塞)。 Fixing that bug caused the socket to stop blocking.修复该错误导致套接字停止阻塞。 Apparently, both O_RDWR and O_NONBLOCK are required to avoid sockets blocking, at least on my operating system and compiler.显然, O_RDWRO_NONBLOCK都需要避免套接字阻塞,至少在我的操作系统和编译器上是这样。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM