套接字编程：是什么导致 select() 系统调用在当前线程执行时不返回？

Question

我在多套接字应用程序上调用select()时遇到问题。

这是它应该如何工作的。

Writer 写道： [0010]HelloWorld在套接字上，其中前 4 个字符始终是表示有效负载大小的数字。

读者应该做到以下几点：

调用select()来验证给定的套接字是否可读，然后read前 4 个字符，将 char 更改为 digit 以获取缓冲区的大小，并分配该大小的缓冲区。
从套接字复制字符（前 4 个字符之后）并粘贴到缓冲区以进行进一步处理
再次read 4 个字符，这应该会失败，并且在无法读取任何数据时，应用程序应该彻底退出程序。

问题出在第三个select调用中。 select然后是read迭代，每次我们检查select()的套接字可读性，一旦验证，我们继续read 。 虽然套接字是有效的并且几乎整个过程都可以正常工作，但除了预期read失败之前的步骤 3 的最后一点之外，我最后一次调用select()系统调用，它在调用select时完全冻结了线程。

我在网上找不到任何可以解释这种奇怪现象的资源。 为了验证线程没有返回，我在进行系统调用select()之前创建了一个虚拟 object 并将其记录在销毁中。 不幸的是，破坏者永远不会被调用。

源代码是专有的，不能共享。

片段：

fd_set f_set;
int err = 0;
while(check_edge_case())
{
  if(time_out_valid())
  {
     int nfds = GetFileDescriptor();

     FD_ZERO(&f_set);
     FD_SET(nfds, &f_set);

     for_each (clients, [&](client_t &client)
     {
       int fd = client.descriptor;
       if(fd)
       {
         FD_SET(fd, &f_set);
         nfds = std::max(nfds, fd);
       }
     });
    
    ++nfds;
    
    // perform select
    if(time_out_valid())
    {
      struct timeval_t time_val = GetTimeOut();
      err = select(nfds, &f_set, 0,0, &time_val);
    }
    else
    {
      err = select(nfds, &f_set, 0, 0, 0); // blocks in this statement
    }
    
    // check for error
    if(!err)
    {
      err = 0;
      continue;
    }
    else if (err = -1 && errno == EINTR)
    {
      err = 0;
      continue;
    }
    else if (err < 0)
    {
      retunr errno;
    }
    
    if(FD_ISSET(GetFileDescriptor(), &f_set))
    {
      return ECANCELED;
    }
    
    // further processing for read operation
    bool executed = false;
    for_each (clients, [&](client_t &client)
     {
       int fd = client.descriptor;
       if(FD_ISSET(fd, &f_set))
       {
         if(client.f_read) err = client.f_read();
         else err = 0;
         
         executed = true;
         break;
       }
     });
    
    if(!found) return ENODATA;
    
    
  }
}

Answer 1

您将错误的timeout参数传递给您对select的第二次调用，因此这是阻塞的。

从文档中：

如果timeval结构的两个字段都为零， then select()立即返回。 （这对于轮询很有用。）

如果timeout被指定为NULL ， select()会无限期地阻塞等待文件描述符准备好。

所以传递一个归零的timeval结构的地址，而不是0 ，你应该没问题。

Answer 2

@PaulSanders 给出了正确答案，只是想我会再解释一下。 你的select(nfds, &f_set, 0, 0, 0); call 为timespec参数传递了一个 0。 这与归零的timespec 。 select的第五个参数是一个指向timespec 的指针，所以当你把 0 放在那里时，它被读取为NULL指针常量，相当于一个timespec指针常量，你可以在这里阅读更多关于指针的NULL 如手册页所述，如果timeout为NULL （即未传入有效timespec的地址），则select将无限期阻塞，这就是您所看到的行为。 我会将该代码更改为

// perform select
struct timeval_t time_val = { 0 }; // zeros the time values in the struct
if(time_out_valid())
{
  // whatever this does to retrieve the timeout, presumably it's
  // returning a non-zeroed struct
  time_val = GetTimeOut();
}
// the address of `time_val` is _not_ 0/NULL. It is valid.
// If `time_out_valid() is false, then its data fields `tv_sec` and
// `tv_nsec` are still both 0, and `select` will return immediately.
err = select(nfds, &f_set, 0, 0, &time_val);

这是恕我直言，但作为一种实践，我总是将NULL用于 NULL/无效指针，将0用于实际零值（例如，对于int s）。 一目了然，这让我更清楚地了解正在使用/预期的数据类型。 但是，我已经看到很多 C 代码将 0 用于 NULL/无效指针，因此这几乎不是“最佳实践规则”。 但是，考虑到这一点，我会将您的select调用更改为

err = select(nfds, &f_set, NULL, NULL, &time_val);

我看到你被标记为 c++。 如果您使用的是现代 c++，您可能可以这样做

err = select(nfds, &f_set, nullptr, nullptr, &time_val);

作为首选方法，但我尚未对此进行测试。

Answer 3

我发现了无限select阻塞的问题。 处理后我没有close套接字。

套接字编程：是什么导致 select() 系统调用在当前线程执行时不返回？

问题描述

3 个解决方案

解决方案1
2 2021-05-25 00:12:30

解决方案2
1 2021-05-25 03:26:14

解决方案3
0 已采纳 2021-06-09 06:32:24

套接字编程：是什么导致 select() 系统调用在当前线程执行时不返回？

问题描述

3 个解决方案

解决方案1 2 2021-05-25 00:12:30

解决方案2 1 2021-05-25 03:26:14

解决方案3 0 已采纳 2021-06-09 06:32:24

解决方案1
2 2021-05-25 00:12:30

解决方案2
1 2021-05-25 03:26:14

解决方案3
0 已采纳 2021-06-09 06:32:24