如何確保線程被阻止？

Question

我有一個多線程C基准測試，可以描述如下：

Thread 1   Thread 2   Thread 3       Control thread

while(1)   while(1)    while(1)       while(1)
   |          |          |             
   |          |          |                |             
   |          |          |            every one second: 
   |          |          |               wait for other threads to be blocked
   |          |          |               do something with S values
   |          |          |                |             
   |          |          |                |             
 write S1    write S2   write S3          |
   |          |          |                |          
   |          |          |                |
 barrier     barrier   barrier         barrier

我的問題涉及在上圖中wait for other threads to be blocked語句。 現在我來到以下解決方案來實現它：

#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <time.h>
#include <inttypes.h>

#define NB_THREADS 11

pthread_barrier_t b;
uint8_t blocked_flags[NB_THREADS] = {0};
pthread_mutex_t blocked_flags_mutexes[NB_THREADS];
uint64_t states[NB_THREADS] = {0};

uint64_t time_diff_get(struct timespec *start, struct timespec *end) {
  uint64_t end_ns = end->tv_sec * 1E9 + end->tv_nsec;
  uint64_t start_ns = start->tv_sec * 1E9 + start->tv_nsec;
  uint64_t res = end_ns - start_ns;
  return res;
}

static void *worker_thread(void *arg) {
  uint8_t id = *((uint8_t *)arg);
  int a =  0;
  while(1) {
    for (int i = 0; i < 1000; i++) {
      a++;
    }
    states[id]++;
    pthread_mutex_lock(&blocked_flags_mutexes[id]);
    blocked_flags[id] = 1;
    pthread_mutex_unlock(&blocked_flags_mutexes[id]);
    pthread_barrier_wait(&b);
    pthread_mutex_lock(&blocked_flags_mutexes[id]);
    blocked_flags[id] = 0;
    pthread_mutex_unlock(&blocked_flags_mutexes[id]);
  }
  printf ("a = %d\n", a);
  return NULL;
}

static void *control_thread() {

  struct timespec last_time;
  clock_gettime(CLOCK_REALTIME, &last_time);

  while(1) {

    struct timespec time;
    clock_gettime(CLOCK_REALTIME, &time);
    if (time_diff_get(&last_time, &time) >= 1E9) {

      // Wait for all threads to be blocked
      for (int i = 0; i < NB_THREADS; i++) {
        while (1) {
          pthread_mutex_lock(&blocked_flags_mutexes[i]);
          if (blocked_flags[i] == 1) {
            pthread_mutex_unlock(&blocked_flags_mutexes[i]);
            break;
          }
          pthread_mutex_unlock(&blocked_flags_mutexes[i]);
        }
      }
      for (int i = 0; i < NB_THREADS; i++) {
        pthread_mutex_lock(&blocked_flags_mutexes[i]);
        if (blocked_flags[i] == 0) {
          printf("How could I avoid to be there ??\n");
          exit(-1);
        }
        pthread_mutex_unlock(&blocked_flags_mutexes[i]);
      }

      // Do some intersting stuff here with states array
      // .....
      // .....

      // Save last time
      clock_gettime(CLOCK_REALTIME, &last_time);
    }

    pthread_barrier_wait(&b);
  }
  return NULL;
}

int main() {

  // Init barrier
  pthread_barrier_init(&b, NULL, NB_THREADS + 1);

  // Create worker threads
  pthread_t threads[NB_THREADS];
  uint8_t ids[NB_THREADS];
  for (int i = 0; i < NB_THREADS; i++) {
    ids[i] = i;
    pthread_mutex_init(&blocked_flags_mutexes[i], NULL);
  }
  for (int i = 0; i < NB_THREADS; i++) {
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    cpu_set_t cpu_set;
    CPU_ZERO(&cpu_set);
    CPU_SET(i + 1, &cpu_set);
    pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpu_set);
    pthread_create(&threads[i], &attr, worker_thread, &ids[i]);
  }

  // Create control thread
  pthread_t ctrl_thread;
  pthread_attr_t attr;
  pthread_attr_init(&attr);
  cpu_set_t cpu_set;
  CPU_ZERO(&cpu_set);
  CPU_SET(0, &cpu_set);
  pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpu_set);
  pthread_create(&ctrl_thread, &attr, control_thread, NULL);

  // Join on worker threads
  for (int i = 0; i < NB_THREADS; i++) {
    pthread_join(threads[i], NULL);
  }

  return 0;
}

但是在12核Intel平台上運行這個用gcc -O0編譯的基准測試清楚地告訴我，我在某個地方遇到了“競爭”問題，因為該過程總是在幾秒鍾之后退出。 我怎么解決這個問題？

注意：以下其他問題我認為使用自定義屏障，但我需要繼續使用pthread_barrier而不是在互斥鎖和cond變量之上實現屏障。

Answer 1

您的代碼有明顯的競爭條件。 當您的線程被屏障等待解鎖時，它們會將標志重置為零。 在他們這樣做之前，他們的旗幟在一段時間內仍然是1。 控制線程可以觀察到這個陳舊值為1，並認為相應的線程已經准備好阻塞，而實際上該線程只是要清除標志，剛剛走出障礙等待：

// worker thread
pthread_barrier_wait(&b);
// No longer blocked, but blocked_flags[id] is still 1.
// At this point, the control thread grabs the mutex, and observes the 1 value
// The mistake is thinking that 1 means "I'm about to block"; it actually
// means, "I'm either about to block on the barrier, or have just finished".
pthread_mutex_lock(&blocked_flags_mutexes[id]);
blocked_flags[id] = 0;
pthread_mutex_unlock(&blocked_flags_mutexes[id]);

這種競爭條件足以愚弄每個人都被阻止的控制線程，從而通過它的第一個循環。 然后它落入第二個循環，它發現並非所有標志都為零。

你的問題的本質是你有一些重復的，循環的並行處理由一群線程完成，由一個屏障控制。 但是，您在循環中僅使用一個屏障等待，這意味着循環只有一個階段。 但是，從語義上講，您的周期分為兩個階段：線程被阻塞和解除阻塞。 您為區分這些階段而構建的機制不是線程安全的; 顯而易見的解決方案是再次使用屏障將循環分成更多階段。

POSIX障礙具有“串行線程”功能：其中一個等待線程被告知它是特殊的。 這允許您實現特殊階段，其中只有串行線程執行一些重要操作，而其他線程可以執行其他操作，例如調用屏障等待跳到下一階段。 這應該消除了實現hacks的需要，比如標志，一個線程試圖猜測其他線程何時變為靜止。

注意：您無法選擇哪個線程是POSIX屏障等待中的串行線程，因此您不能僅為該操作設置專用控制線程。 您只需使用N個線程，而不是N + 1個線程。 他們都做同樣的事情，當他們到達障礙時; 他們中的任何一個都可以被告知它是串行線程。 基於此，串行線程執行一些替代代碼與其他代碼相比。

所以，圖表時間：

while(1)   while(1)    while(1)       
   |          |          |             
   |          |          |          
   |          |          | 
   |          |          |   <---- WRITE PHASE  
   |          |          |  
   |          |          |             
   |          |          |                 
 write S1    write S2   write S3
   |          |          |           
   |          |          |      
 barrier     barrier   barrier 
   |          |          |        
   |          |          |     <--- CHECK PHASE
   |          |          |           
   |          |     serial thread!   
   |          |          |           
   |          |       next second?-- YES -> do something with S values!
   |          |          |  NO        |
   |          |          |            |
   |          |          +------------+ 
   |          |          | 
 barrier     barrier   barrier
   |          |          | 
   |          |          | 

back to top, next WRITE PHASE.

這里，在CHECK PHASE ，串行線程（可以是N個線程中的任何一個）執行檢查：自上次轉換到下一秒以來，時間是否已轉換到下一秒？ 如果是這樣，它會對S值做一些事情。

屏障確保其他線程不接觸CHECK_PHASE的值，因此串行線程不需要互斥鎖來處理S值！ 您已經為每個循環中的額外屏障調用支付了此同步費用。

你可以有一個提供時間基礎的額外線程：它的工作是睡眠，直到下一秒到達，然后增加一個計數器。 串行線程只需檢查此計數器是否已遞增（相對於其舊值，存儲在另一個變量中）。 然后執行操作並更新舊計數器以匹配新計數器。 這樣，您就不必調用操作系統來獲取主處理循環中的當前時間。

Answer 2

您可以互斥保護單個計數器，而不是為每個工作線程保留一個標志，並且每個工作線程可以在屏障釋放后立即阻止和減少該計數器。 這樣可以避免等待第一個線程被阻塞，然后是第二個線程，然后是第三個線程，等等。

我沒有看到你的控制線程退出的位置（除了在意外情況下）並且主線程似乎沒有等待它。

也許你也想在worker-threads之前創建你的控制線程。

也許你還想讓工作線程和控制線程同步，讓它們在釋放之前等待屏障並開始實際工作！

Answer 3

我認為發生的事情可能是這樣的：

在control_thread（）中第一次執行while(1)時， time_diff_get(&last_time, &time)返回值<1E9，因此線程直接進入屏障
現在所有工作線程最終都會遇到障礙
發生這種情況后， control_thread()第二次執行它的循環並立即檢查blocked_flags[i]
如果在該線程重置其標志之前至少有一個線程發生這種情況，您將獲得預期的行為。

對不起，我目前無法提供解決方案，但如果我正確理解問題是一個解決方案的良好開端。

如何確保線程被阻止？

問題描述

3 個解決方案

解決方案1
1 2014-08-14 17:56:35

解決方案2
0 2014-08-14 17:29:30

解決方案3
0 2014-08-14 17:41:45

如何確保線程被阻止？

問題描述

3 個解決方案

解決方案1 1 2014-08-14 17:56:35

解決方案2 0 2014-08-14 17:29:30

解決方案3 0 2014-08-14 17:41:45

解決方案1
1 2014-08-14 17:56:35

解決方案2
0 2014-08-14 17:29:30

解決方案3
0 2014-08-14 17:41:45