简体   繁体   中英

What could cause a non-blocking socket to block on `recv`?

I have a TCP/IP socket set to non-blocking that is blocking anyway. The socket is only referenced in one thread. This code works on Windows (with a few call substitutions) but not on Linux. I have code that looks like this (Don't mind the C-style casts -- this was written long ago. Also, I trimmed it up a bit, so let me know if I accidentally trimmed off a step. Chances are that I'm actually doing that step. The actual code is on another computer, so I can't copy-paste.):

// In the real code, these are class members. I'm not bonkers
int mSocket;
sockaddr_in mAddress;

void CreateSocket(
    unsigned int ipAddress,
    unsigned short port)
{        
    // Omitting my error checking in this question for brevity because everything comes back valid
    mSocket = socket(AF_INET, SOCK_STREAM, 0);  // Not -1

    int oldFlags = fctnl(mSocket, F_GETFL, 0);  // Not -1
    fcntl(mSocket, F_SETFL, oldFlags | O_NONBLOCK);  // Not -1

    mAddress.sin_family = AF_INET;
    mAddress.sin_addr.s_addr = ipAddress;  // address is valid
    mAddress.sin_port = htons((u_short)port);  // port is not 0 and allowed on firewall
    memset(mAddress.sin_zero, 0, sizeof(mAddress.sin_zero));

    // <Connect attempt loop starts here>
    connect(mSocket, (sockaddr*)&mAddress, sizeof(mAddress));  // Not -1 to exit loop
    // <Connect attempt loop ends here>
    // Connection is now successful ('connect' returned a value other than -1)
}

// ... Stuff happens ...

// ... Then this is called because 'select' call shows read data available ...
void AttemptReceive(
    MyReturnBufferTypeThatsNotImportant &returnedBytes)
{
    // Read socket
    const size_t bufferSize = 4096;
    char buffer[bufferSize];
    int result = 0;

    do {
        // Debugging code: sanity checks
        int socketFlags = fcntl(mSocket, F_GETFL, 0);  // Not -1
        printf("result=%d\n", result);
        printf("O_NONBLOCK? %d\n", socketFlags & O_NONBLOCK);  // Always prints "O_NONBLOCK? 2048"

        result = recv(mSocket, buffer, bufferSize, 0);  // NEVER -1 or 0 after hundreds to thousands of calls, then suddenly blocks

        // ... Save off and package read data into user format for output to caller ...
    } while (result == bufferSize);
}

I believe, because AttemptReceive is called in response to select, that the socket just happens to contain exactly a number of bytes equal to a multiple of the buffer size (4096). I've somewhat confirmed this with the printf statements, so it never blocks on the first loop-through. Every time this bug happens, the last two lines to get printed before the thread blocks are:

result=4096
O_NONBLOCK? 2048

Changing the recv line to recv(mSocket, buffer, bufferSize, MSG_DONTWAIT); actually "fixes" the issue (suddenly, recv occasionally returns -1 with errno EWOULDBLOCK/EAGAIN (both equal to each other on my OS)), but I'm afraid I'm just putting a band-aid on a gushing wound, so to speak. Any ideas?

PS the address is "localhost", but I don't think it matters.

Note: I'm using an old compiler (not by choice), g++ 4.4.7-23 from 2010. That may have something to do with the issue.

socket() automatically sets O_RDWR on the socket with my operating system and compiler, but it appears that O_RDWR had accidentally gotten unset on the socket in question at the start of the program (which somehow allowed it to read fine if there was data to read, but block otherwise). Fixing that bug caused the socket to stop blocking. Apparently, both O_RDWR and O_NONBLOCK are required to avoid sockets blocking, at least on my operating system and compiler.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM