简体   繁体   English

套接字编程:是什么导致 select() 系统调用在当前线程执行时不返回?

[英]Socket programming: What is causing select() system call not return on current thread execution?

I am having problem with select() call on an multi-socket app.我在多套接字应用程序上调用select()时遇到问题。

Here is how it is supposed to work.这是它应该如何工作的。

Writer writes: [0010]HelloWorld on a socket, where the the first 4 character are always digits representing the payload size. Writer 写道: [0010]HelloWorld在套接字上,其中前 4 个字符始终是表示有效负载大小的数字。

Reader should do the following:读者应该做到以下几点:

  1. call select() to verify if a given socket is readable, then read first 4 character, change char to digit to get size of buffer in number, and allocate a buffer of that size.调用select()来验证给定的套接字是否可读,然后read前 4 个字符,将 char 更改为 digit 以获取缓冲区的大小,并分配该大小的缓冲区。
  2. copy characters (after the first 4 characters) from the socket and paste to the buffer for further processing从套接字复制字符(前 4 个字符之后)并粘贴到缓冲区以进行进一步处理
  3. read 4 characters again, which should fail and upon failure to read any data, the app should clean exit the program.再次read 4 个字符,这应该会失败,并且在无法读取任何数据时,应用程序应该彻底退出程序。

Problem is in the 3rd select call.问题出在第三个select调用中。 select followed by read iteration, every time we check select() for readability of socket and once that is verified, we proceed with the read . select然后是read迭代,每次我们检查select()的套接字可读性,一旦验证,我们继续read While the socket is valid and almost whole process works just fine, except for last point at step 3 before read is expected to fail, I call select() system call for last time, and it completely freezes the thread upon calling select .虽然套接字是有效的并且几乎整个过程都可以正常工作,但除了预期read失败之前的步骤 3 的最后一点之外,我最后一次调用select()系统调用,它在调用select时完全冻结了线程。

I am not finding any sources online which can explain this weird phenomenon.我在网上找不到任何可以解释这种奇怪现象的资源。 To verify that the thread is not returning I have created a dummy object just before making the system call select() and logged it on destruction.为了验证线程没有返回,我在进行系统调用select()之前创建了一个虚拟 object 并将其记录在销毁中。 Unformtunately, the distructor is never getting called.不幸的是,破坏者永远不会被调用。

Source code is propriotery, cannot be shared.源代码是专有的,不能共享。

snippet:片段:

fd_set f_set;
int err = 0;
while(check_edge_case())
{
  if(time_out_valid())
  {
     int nfds = GetFileDescriptor();

     FD_ZERO(&f_set);
     FD_SET(nfds, &f_set);

     for_each (clients, [&](client_t &client)
     {
       int fd = client.descriptor;
       if(fd)
       {
         FD_SET(fd, &f_set);
         nfds = std::max(nfds, fd);
       }
     });
    
    ++nfds;
    
    // perform select
    if(time_out_valid())
    {
      struct timeval_t time_val = GetTimeOut();
      err = select(nfds, &f_set, 0,0, &time_val);
    }
    else
    {
      err = select(nfds, &f_set, 0, 0, 0); // blocks in this statement
    }
    
    // check for error
    if(!err)
    {
      err = 0;
      continue;
    }
    else if (err = -1 && errno == EINTR)
    {
      err = 0;
      continue;
    }
    else if (err < 0)
    {
      retunr errno;
    }
    
    if(FD_ISSET(GetFileDescriptor(), &f_set))
    {
      return ECANCELED;
    }
    
    // further processing for read operation
    bool executed = false;
    for_each (clients, [&](client_t &client)
     {
       int fd = client.descriptor;
       if(FD_ISSET(fd, &f_set))
       {
         if(client.f_read) err = client.f_read();
         else err = 0;
         
         executed = true;
         break;
       }
     });
    
    if(!found) return ENODATA;
    
    
  }
}

You are passing the wrong timeout parameter to your second call to select , which is therefore blocking.您将错误的timeout参数传递给您对select的第二次调用,因此这是阻塞的。

From the documentation:从文档中:

If both fields of the timeval structure are zero, then select() returns immediately.如果timeval结构的两个字段都为零, then select()立即返回。 (This is useful for polling.) (这对于轮询很有用。)

If timeout is specified as NULL , select() blocks indefinitely waiting for a file descriptor to become ready.如果timeout被指定为NULLselect()会无限期地阻塞等待文件描述符准备好。

So pass the address of a zeroed timeval struct, rather than 0 , and you should be OK.所以传递一个归零的timeval结构的地址,而不是0 ,你应该没问题。

@PaulSanders gave the correct answer, just thought I'd explain it a little more. @PaulSanders 给出了正确答案,只是想我会再解释一下。 Your select(nfds, &f_set, 0, 0, 0);你的select(nfds, &f_set, 0, 0, 0); call is passing a 0 for the timespec argument. call 为timespec参数传递了一个 0。 This is not the same thing as a zeroed out timespec .归零的timespec The fifth argument of select is a pointer to a timespec , so when you put 0 there, it's read as a NULL pointer constant, equivalent to a NULL pointer (you can read more about that here ). select的第五个参数是一个指向timespec 的指针,所以当你把 0 放在那里时,它被读取为NULL指针常量,相当于一个timespec指针常量,你可以在这里阅读更多关于指针的NULL As the man page says, if the timeout is NULL (that is, the address of a valid timespec is not passed in), then select will block indefinitely, and this is the behavior you're seeing.如手册页所述,如果timeoutNULL (即传入有效timespec的地址),则select将无限期阻塞,这就是您所看到的行为。 I would change that code to我会将该代码更改为

// perform select
struct timeval_t time_val = { 0 }; // zeros the time values in the struct
if(time_out_valid())
{
  // whatever this does to retrieve the timeout, presumably it's
  // returning a non-zeroed struct
  time_val = GetTimeOut();
}
// the address of `time_val` is _not_ 0/NULL. It is valid.
// If `time_out_valid() is false, then its data fields `tv_sec` and
// `tv_nsec` are still both 0, and `select` will return immediately.
err = select(nfds, &f_set, 0, 0, &time_val);

This is IMHO, but as a practice, I always use NULL for NULL/invalid pointers, and 0 for actual zero values (for int s, for instance).这是恕我直言,但作为一种实践,我总是将NULL用于 NULL/无效指针,将0用于实际零值(例如,对于int s)。 That just makes it clearer to me, at a glance, what data types are being used/expected.一目了然,这让我更清楚地了解正在使用/预期的数据类型。 However, I've seen plenty of C code that uses 0 for NULL/invalid pointers, so that's hardly a "best practice rule".但是,我已经看到很多 C 代码将 0 用于 NULL/无效指针,因此这几乎不是“最佳实践规则”。 But, with that in mind, I would change your select call to但是,考虑到这一点,我会将您的select调用更改为

err = select(nfds, &f_set, NULL, NULL, &time_val);

I see you're tagged c++.我看到你被标记为 c++。 If you're using modern c++, you can probably do如果您使用的是现代 c++,您可能可以这样做

err = select(nfds, &f_set, nullptr, nullptr, &time_val);

as the preferred method, but I have not tested this.作为首选方法,但我尚未对此进行测试。

I found the problem with the infinite select blockage.我发现了无限select阻塞的问题。 I was not close ing the socket after processing.处理后我没有close套接字。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM