套接字編程：是什么導致 select() 系統調用在當前線程執行時不返回？

Question

我在多套接字應用程序上調用select()時遇到問題。

這是它應該如何工作的。

Writer 寫道： [0010]HelloWorld在套接字上，其中前 4 個字符始終是表示有效負載大小的數字。

讀者應該做到以下幾點：

調用select()來驗證給定的套接字是否可讀，然后read前 4 個字符，將 char 更改為 digit 以獲取緩沖區的大小，並分配該大小的緩沖區。
從套接字復制字符（前 4 個字符之后）並粘貼到緩沖區以進行進一步處理
再次read 4 個字符，這應該會失敗，並且在無法讀取任何數據時，應用程序應該徹底退出程序。

問題出在第三個select調用中。 select然后是read迭代，每次我們檢查select()的套接字可讀性，一旦驗證，我們繼續read 。 雖然套接字是有效的並且幾乎整個過程都可以正常工作，但除了預期read失敗之前的步驟 3 的最后一點之外，我最后一次調用select()系統調用，它在調用select時完全凍結了線程。

我在網上找不到任何可以解釋這種奇怪現象的資源。 為了驗證線程沒有返回，我在進行系統調用select()之前創建了一個虛擬 object 並將其記錄在銷毀中。 不幸的是，破壞者永遠不會被調用。

源代碼是專有的，不能共享。

片段：

fd_set f_set;
int err = 0;
while(check_edge_case())
{
  if(time_out_valid())
  {
     int nfds = GetFileDescriptor();

     FD_ZERO(&f_set);
     FD_SET(nfds, &f_set);

     for_each (clients, [&](client_t &client)
     {
       int fd = client.descriptor;
       if(fd)
       {
         FD_SET(fd, &f_set);
         nfds = std::max(nfds, fd);
       }
     });
    
    ++nfds;
    
    // perform select
    if(time_out_valid())
    {
      struct timeval_t time_val = GetTimeOut();
      err = select(nfds, &f_set, 0,0, &time_val);
    }
    else
    {
      err = select(nfds, &f_set, 0, 0, 0); // blocks in this statement
    }
    
    // check for error
    if(!err)
    {
      err = 0;
      continue;
    }
    else if (err = -1 && errno == EINTR)
    {
      err = 0;
      continue;
    }
    else if (err < 0)
    {
      retunr errno;
    }
    
    if(FD_ISSET(GetFileDescriptor(), &f_set))
    {
      return ECANCELED;
    }
    
    // further processing for read operation
    bool executed = false;
    for_each (clients, [&](client_t &client)
     {
       int fd = client.descriptor;
       if(FD_ISSET(fd, &f_set))
       {
         if(client.f_read) err = client.f_read();
         else err = 0;
         
         executed = true;
         break;
       }
     });
    
    if(!found) return ENODATA;
    
    
  }
}

Answer 1

您將錯誤的timeout參數傳遞給您對select的第二次調用，因此這是阻塞的。

從文檔中：

如果timeval結構的兩個字段都為零， then select()立即返回。 （這對於輪詢很有用。）

如果timeout被指定為NULL ， select()會無限期地阻塞等待文件描述符准備好。

所以傳遞一個歸零的timeval結構的地址，而不是0 ，你應該沒問題。

Answer 2

@PaulSanders 給出了正確答案，只是想我會再解釋一下。 你的select(nfds, &f_set, 0, 0, 0); call 為timespec參數傳遞了一個 0。 這與歸零的timespec 。 select的第五個參數是一個指向timespec 的指針，所以當你把 0 放在那里時，它被讀取為NULL指針常量，相當於一個timespec指針常量，你可以在這里閱讀更多關於指針的NULL 如手冊頁所述，如果timeout為NULL （即未傳入有效timespec的地址），則select將無限期阻塞，這就是您所看到的行為。 我會將該代碼更改為

// perform select
struct timeval_t time_val = { 0 }; // zeros the time values in the struct
if(time_out_valid())
{
  // whatever this does to retrieve the timeout, presumably it's
  // returning a non-zeroed struct
  time_val = GetTimeOut();
}
// the address of `time_val` is _not_ 0/NULL. It is valid.
// If `time_out_valid() is false, then its data fields `tv_sec` and
// `tv_nsec` are still both 0, and `select` will return immediately.
err = select(nfds, &f_set, 0, 0, &time_val);

這是恕我直言，但作為一種實踐，我總是將NULL用於 NULL/無效指針，將0用於實際零值（例如，對於int s）。 一目了然，這讓我更清楚地了解正在使用/預期的數據類型。 但是，我已經看到很多 C 代碼將 0 用於 NULL/無效指針，因此這幾乎不是“最佳實踐規則”。 但是，考慮到這一點，我會將您的select調用更改為

err = select(nfds, &f_set, NULL, NULL, &time_val);

我看到你被標記為 c++。 如果您使用的是現代 c++，您可能可以這樣做

err = select(nfds, &f_set, nullptr, nullptr, &time_val);

作為首選方法，但我尚未對此進行測試。

Answer 3

我發現了無限select阻塞的問題。 處理后我沒有close套接字。

套接字編程：是什么導致 select() 系統調用在當前線程執行時不返回？

問題描述

3 個解決方案

解決方案1
2 2021-05-25 00:12:30

解決方案2
1 2021-05-25 03:26:14

解決方案3
0 已采納 2021-06-09 06:32:24

套接字編程：是什么導致 select() 系統調用在當前線程執行時不返回？

問題描述

3 個解決方案

解決方案1 2 2021-05-25 00:12:30

解決方案2 1 2021-05-25 03:26:14

解決方案3 0 已采納 2021-06-09 06:32:24

解決方案1
2 2021-05-25 00:12:30

解決方案2
1 2021-05-25 03:26:14

解決方案3
0 已采納 2021-06-09 06:32:24