如何在所有線程之間同步變量的值？

Question

如果我有以下情況：

bool cond_var;

#pragma omp parallel shared(cond_var)
{
    bool some_private_var;
    // ...
    do {
       #pragma omp single
       {
           cond_var = true;
       }

       // do something, calculate some_private_var;
       // ...

       #pragma omp atomic update
       cond_var &= some_private_var;

       // Syncing step
       // (???)

    } while(cond_var);

    // ... (other parallel stuff)
}

我希望所有循環的do-while循環具有相同的迭代次數，但是當我嘗試#pragma omp barrier作為同步步驟時（恰好在循環結束之前），我以死鎖結束。 打印cond_var的值表明，一些線程將其視為true而其他線程將其視為false ，因此對於某些線程而言，該循環已完成，而其他線程則cond_var了障礙。 然后，我嘗試了各種組合以及barrier和flush排序，但是沒有運氣（對於某些組合，死鎖被推遲了）。

如何在線程之間正確組合和同步循環條件，以便所有循環具有相同的迭代次數？

UPDATE

我還嘗試使用#pragma atomic read並將cond_var的值加載到另一個私有變量，並測試該條件。 它也沒有用。 顯然，原子讀取保證我具有一致的值（舊值或新值），但不保證它是最新的。

更新2

根據代碼Jonathan Dursi的代碼，這是一個MVCE，看起來更像我要嘗試執行的操作：

#include <omp.h>
#include <cstdio>
#include <random>
#include <chrono>
#include <thread>

int main() {

    bool cond_var;
    const int nthreads = omp_get_max_threads();

    #pragma omp parallel default(none) shared(cond_var)
    {
        bool some_private_var;
        std::random_device rd;
        std::mt19937 rng(rd());
        unsigned iter_count = 0;

        /* chance of having to end: 1 in 6**nthreads; all threads must choose 0 */
        std::uniform_int_distribution<int> dice(0,5);

        const int tid = omp_get_thread_num();
        printf("Thread %d started.\n", tid);
        do {
            ++iter_count;

            #pragma omp once shared(cond_var)
            {
                // cond_var must be reset to 'true' because it is the
                // neutral element of &
                // For the loop to end, all threads must choose the
                // same random value 0
                cond_var = true;
            }

            some_private_var = (dice(rng) == 0);

            // If all threads choose 0, cond_var will remain 'true', ending the loop
            #pragma omp atomic update
            cond_var &= some_private_var;

            #pragma omp barrier
        } while(!cond_var);
        printf("Thread %d finished with %u iterations.\n", tid, iter_count);
    }

    return 0;
}

在具有足夠邏輯核心以同時運行所有邏輯核心的機器中以8個線程運行，大多數運行在第一次迭代中死鎖，盡管在第二次迭代中有一次運行正確完成（不符合1679616中1的機會（6 ** 8）所有線程都選擇0）。

Answer 1

問題是，在while循環中，您要兩次更新cond_var並第三次使用它，並且您需要確保這些操作不會相互干擾。 每次循環迭代，代碼：

設置cond_var = true（使用不存在的OpenMP編譯指示“一次”，每個線程都會忽略它，因此會執行此操作）
通過使用本地條件變量＆ing更新cond_var；
使用每個人更新的cond_var測試是否退出循環。

因此，需要確保一個線程未將cond_var設置為true（1），而其他線程對其進行了設置（2）； 使用線程進行循環外測試時，沒有線程仍在運行（2）（3）； 並且沒有線程對其進行測試（3），而線程將其設置為true（1）。

這樣做的明顯方法是設置屏障，在這三種情況中的每一種之間都設置了一個屏障-因此設置了三個屏障。 所以這工作：

#include <omp.h>
#include <random>
#include <chrono>
#include <thread>
#include <iostream>

int main() {

    bool cond_var;

    #pragma omp parallel default(none) shared(cond_var,std::cout)
    {
        bool some_private_var;
        std::random_device rd;
        std::mt19937 rng(rd());
        unsigned iter_count = 0;

        std::uniform_int_distribution<int> dice(0,1);

        const int tid = omp_get_thread_num();
        printf("Thread %d started.\n", tid);
        do {
            ++iter_count;

            #pragma omp barrier
            #pragma omp single 
            cond_var = true;
            // implicit barrier here after the single; turned off with a nowait clause.

            some_private_var = (dice(rng) == 0);

            // If all threads choose 0, cond_var will remain 'true', ending the loop
            #pragma omp atomic update
            cond_var &= some_private_var;

            #pragma omp barrier
        } while(!cond_var);

        #pragma omp critical
        std::cout << "Thread " << tid << " finished with " << iter_count << " iterations." << std::endl;
    }

    return 0;
}

通過讓每個線程在共享數組中僅設置一個局部變量，然后讓一個線程來執行與操作，您可以做得更好。 因此，您仍然需要兩個障礙，一個是確保每個人都在完成安定操作之前完成，另一個是要確保在完成測試前完成安定操作：

#include <omp.h>
#include <random>
#include <chrono>
#include <thread>
#include <iostream>

int main() {

    bool cond_var;

    const int num_threads = omp_get_max_threads();
    const unsigned int spacing=64/sizeof(bool);  /* to avoid false sharing */
    bool local_cond_var[num_threads*spacing];

    #pragma omp parallel default(none) shared(cond_var,std::cout,local_cond_var)
    {
        std::random_device rd;
        std::mt19937 rng(rd());
        unsigned iter_count = 0;

        std::uniform_int_distribution<int> dice(0,1);

        const int tid = omp_get_thread_num();
        printf("Thread %d started.\n", tid);
        do {
            ++iter_count;

            local_cond_var[tid*spacing] = (dice(rng) == 0);

            #pragma omp barrier
            #pragma omp single
            {
                cond_var = true;
                for (int i=0; i<num_threads; i++)
                    cond_var &= local_cond_var[i*spacing];
            }
            // implicit barrier here after the single; turned off with a nowait clause.
        } while(!cond_var);

        #pragma omp critical
        std::cout << "Thread " << tid << " finished with " << iter_count << " iterations." << std::endl;
    }

    return 0;
}

請注意，無論是顯式的還是隱式的，這些障礙都意味着沖刷共享變量，並向單身人士添加nowait子句會導致間歇性死鎖。

Answer 2

在循環體中的最后一條語句之后放置#pragma omp barrier並不會給我造成僵局，但這也不足夠。 盡管工作線程將在屏障處等待直到它們都可以一起通過，但這不能確保它們在另一端具有一致的cond_var視圖。 如果在任何迭代中要更新cond_var的第一個線程cond_var將其cond_var為true ，那么盡管另一個線程后來將其設置為false但這些線程中的某些或全部仍可以執行另一個迭代。 只有當這些線程返回原子更新時，他們才能確定看到其他線程寫入的值。

在測試循環條件之前，您應該能夠通過在屏障之后執行條件變量的原子讀取來解決該問題。 您需要執行此操作或采取其他措施來解決問題，因為它違反了OpenMP約束，線程組中的不同線程到達該障礙的次數不同。 實際上，這可能是程序掛起的原因：執行額外迭代的線程被卡住，等待其他線程進入障礙。

如何在所有線程之間同步變量的值？

問題描述

2 個解決方案

解決方案1
1 已采納 2014-10-10 21:49:42

解決方案2
0 2014-10-10 19:17:18

如何在所有線程之間同步變量的值？

問題描述

2 個解決方案

解決方案1 1 已采納 2014-10-10 21:49:42

解決方案2 0 2014-10-10 19:17:18

解決方案1
1 已采納 2014-10-10 21:49:42

解決方案2
0 2014-10-10 19:17:18