简体   繁体   中英

Socket programming: What is causing select() system call not return on current thread execution?

I am having problem with select() call on an multi-socket app.

Here is how it is supposed to work.

Writer writes: [0010]HelloWorld on a socket, where the the first 4 character are always digits representing the payload size.

Reader should do the following:

  1. call select() to verify if a given socket is readable, then read first 4 character, change char to digit to get size of buffer in number, and allocate a buffer of that size.
  2. copy characters (after the first 4 characters) from the socket and paste to the buffer for further processing
  3. read 4 characters again, which should fail and upon failure to read any data, the app should clean exit the program.

Problem is in the 3rd select call. select followed by read iteration, every time we check select() for readability of socket and once that is verified, we proceed with the read . While the socket is valid and almost whole process works just fine, except for last point at step 3 before read is expected to fail, I call select() system call for last time, and it completely freezes the thread upon calling select .

I am not finding any sources online which can explain this weird phenomenon. To verify that the thread is not returning I have created a dummy object just before making the system call select() and logged it on destruction. Unformtunately, the distructor is never getting called.

Source code is propriotery, cannot be shared.

snippet:

fd_set f_set;
int err = 0;
while(check_edge_case())
{
  if(time_out_valid())
  {
     int nfds = GetFileDescriptor();

     FD_ZERO(&f_set);
     FD_SET(nfds, &f_set);

     for_each (clients, [&](client_t &client)
     {
       int fd = client.descriptor;
       if(fd)
       {
         FD_SET(fd, &f_set);
         nfds = std::max(nfds, fd);
       }
     });
    
    ++nfds;
    
    // perform select
    if(time_out_valid())
    {
      struct timeval_t time_val = GetTimeOut();
      err = select(nfds, &f_set, 0,0, &time_val);
    }
    else
    {
      err = select(nfds, &f_set, 0, 0, 0); // blocks in this statement
    }
    
    // check for error
    if(!err)
    {
      err = 0;
      continue;
    }
    else if (err = -1 && errno == EINTR)
    {
      err = 0;
      continue;
    }
    else if (err < 0)
    {
      retunr errno;
    }
    
    if(FD_ISSET(GetFileDescriptor(), &f_set))
    {
      return ECANCELED;
    }
    
    // further processing for read operation
    bool executed = false;
    for_each (clients, [&](client_t &client)
     {
       int fd = client.descriptor;
       if(FD_ISSET(fd, &f_set))
       {
         if(client.f_read) err = client.f_read();
         else err = 0;
         
         executed = true;
         break;
       }
     });
    
    if(!found) return ENODATA;
    
    
  }
}

You are passing the wrong timeout parameter to your second call to select , which is therefore blocking.

From the documentation:

If both fields of the timeval structure are zero, then select() returns immediately. (This is useful for polling.)

If timeout is specified as NULL , select() blocks indefinitely waiting for a file descriptor to become ready.

So pass the address of a zeroed timeval struct, rather than 0 , and you should be OK.

@PaulSanders gave the correct answer, just thought I'd explain it a little more. Your select(nfds, &f_set, 0, 0, 0); call is passing a 0 for the timespec argument. This is not the same thing as a zeroed out timespec . The fifth argument of select is a pointer to a timespec , so when you put 0 there, it's read as a NULL pointer constant, equivalent to a NULL pointer (you can read more about that here ). As the man page says, if the timeout is NULL (that is, the address of a valid timespec is not passed in), then select will block indefinitely, and this is the behavior you're seeing. I would change that code to

// perform select
struct timeval_t time_val = { 0 }; // zeros the time values in the struct
if(time_out_valid())
{
  // whatever this does to retrieve the timeout, presumably it's
  // returning a non-zeroed struct
  time_val = GetTimeOut();
}
// the address of `time_val` is _not_ 0/NULL. It is valid.
// If `time_out_valid() is false, then its data fields `tv_sec` and
// `tv_nsec` are still both 0, and `select` will return immediately.
err = select(nfds, &f_set, 0, 0, &time_val);

This is IMHO, but as a practice, I always use NULL for NULL/invalid pointers, and 0 for actual zero values (for int s, for instance). That just makes it clearer to me, at a glance, what data types are being used/expected. However, I've seen plenty of C code that uses 0 for NULL/invalid pointers, so that's hardly a "best practice rule". But, with that in mind, I would change your select call to

err = select(nfds, &f_set, NULL, NULL, &time_val);

I see you're tagged c++. If you're using modern c++, you can probably do

err = select(nfds, &f_set, nullptr, nullptr, &time_val);

as the preferred method, but I have not tested this.

I found the problem with the infinite select blockage. I was not close ing the socket after processing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM