简体   繁体   English

如何在所有线程之间同步变量的值?

[英]How to synchronize the value of a variable among all threads?

If I have the following case: 如果我有以下情况:

bool cond_var;

#pragma omp parallel shared(cond_var)
{
    bool some_private_var;
    // ...
    do {
       #pragma omp single
       {
           cond_var = true;
       }

       // do something, calculate some_private_var;
       // ...

       #pragma omp atomic update
       cond_var &= some_private_var;

       // Syncing step
       // (???)

    } while(cond_var);

    // ... (other parallel stuff)
}

I want my do-while loop to have the same number of iterations for all my threads, but when I tried #pragma omp barrier as the syncing step (just before the end of the loop), I have ended with a deadlock. 我希望所有循环的do-while循环具有相同的迭代次数,但是当我尝试#pragma omp barrier作为同步步骤时(恰好在循环结束之前),我以死锁结束。 Printing the value of cond_var showed that some threads saw it as true while others saw it as false , so the loop finished for some, leaving the others deadlocked on the barrier. 打印cond_var的值表明,一些线程将其视为true而其他线程将其视为false ,因此对于某些线程而言,该循环已完成,而其他线程则cond_var了障碍。 Then I have tried various combinations and ordering of barrier and flush , with no luck (with some combinations, the deadlock was postponed). 然后,我尝试了各种组合以及barrierflush排序,但是没有运气(对于某些组合,死锁被推迟了)。

How to properly combine and sync the loop condition among the threads so the all the loops will have the same number of iterations? 如何在线程之间正确组合和同步循环条件,以便所有循环具有相同的迭代次数?

UPDATE UPDATE

I have also tried loading the value of cond_var to another private variable with #pragma atomic read , and testing that condition. 我还尝试使用#pragma atomic read并将cond_var的值加载到另一个私有变量,并测试该条件。 It also didn't work. 它也没有用。 Apparently, atomic read guarantee I have a consistent value (either old or new), but doesn't guarantee it is the latest. 显然,原子读取保证我具有一致的值(旧值或新值),但不保证它是最新的。

UPDATE 2 更新2

Based on code Jonathan Dursi's code, this is an MVCE that look more like what I am trying to do: 根据代码Jonathan Dursi的代码,这是一个MVCE,看起来更像我要尝试执行的操作:

#include <omp.h>
#include <cstdio>
#include <random>
#include <chrono>
#include <thread>

int main() {

    bool cond_var;
    const int nthreads = omp_get_max_threads();

    #pragma omp parallel default(none) shared(cond_var)
    {
        bool some_private_var;
        std::random_device rd;
        std::mt19937 rng(rd());
        unsigned iter_count = 0;

        /* chance of having to end: 1 in 6**nthreads; all threads must choose 0 */
        std::uniform_int_distribution<int> dice(0,5);

        const int tid = omp_get_thread_num();
        printf("Thread %d started.\n", tid);
        do {
            ++iter_count;

            #pragma omp once shared(cond_var)
            {
                // cond_var must be reset to 'true' because it is the
                // neutral element of &
                // For the loop to end, all threads must choose the
                // same random value 0
                cond_var = true;
            }

            some_private_var = (dice(rng) == 0);

            // If all threads choose 0, cond_var will remain 'true', ending the loop
            #pragma omp atomic update
            cond_var &= some_private_var;

            #pragma omp barrier
        } while(!cond_var);
        printf("Thread %d finished with %u iterations.\n", tid, iter_count);
    }

    return 0;
}

Running with 8 threads in a machine with enough logical cores to run all of them simultaneously, most runs deadlock in the first iteration, although there was one run that finished correctly on the second iteration (not conforming with the chances of 1 in 1679616 (6**8) of having all threads choosing 0). 在具有足够逻辑核心以同时运行所有逻辑核心的机器中以8个线程运行,大多数运行在第一次迭代中死锁,尽管在第二次迭代中有一次运行正确完成(不符合1679616中1的机会(6 ** 8)所有线程都选择0)。

The issue is that in the while loop, you are updating the cond_var twice and using it a third time, and you need to ensure that these operations don't interfere with each other. 问题是,在while循环中,您要两次更新cond_var并第三次使用它,并且您需要确保这些操作不会相互干扰。 Each loop iteration, the code: 每次循环迭代,代码:

  1. sets cond_var = true (using a non-existant OpenMP pragma, "once", which is ignored and so done by every thread) 设置cond_var = true(使用不存在的OpenMP编译指示“一次”,每个线程都会忽略它,因此会执行此操作)
  2. Updates cond_var by &ing it with the local condition variable; 通过使用本地条件变量&ing更新cond_var;
  3. Uses the updated-by-everyone cond_var to test whether to exit out of the loop. 使用每个人更新的cond_var测试是否退出循环。

Thus, one needs to make sure one thread isn't setting cond_var true (1) while other threads are anding it (2); 因此,需要确保一个线程未将cond_var设置为true(1),而其他线程对其进行了设置(2); no threads are still anding (2) while using it to test out of the loop (3); 使用线程进行循环外测试时,没有线程仍在运行(2)(3); and no threads are testing it (3) while a thread is setting it to true (1). 并且没有线程对其进行测试(3),而线程将其设置为true(1)。

The obvious way to do that is with barriers, one between each of those three cases - so three barriers. 这样做的明显方法是设置屏障,在这三种情况中的每一种之间都设置了一个屏障-因此设置了三个屏障。 So this works: 所以这工作:

#include <omp.h>
#include <random>
#include <chrono>
#include <thread>
#include <iostream>

int main() {

    bool cond_var;

    #pragma omp parallel default(none) shared(cond_var,std::cout)
    {
        bool some_private_var;
        std::random_device rd;
        std::mt19937 rng(rd());
        unsigned iter_count = 0;

        std::uniform_int_distribution<int> dice(0,1);

        const int tid = omp_get_thread_num();
        printf("Thread %d started.\n", tid);
        do {
            ++iter_count;

            #pragma omp barrier
            #pragma omp single 
            cond_var = true;
            // implicit barrier here after the single; turned off with a nowait clause.

            some_private_var = (dice(rng) == 0);

            // If all threads choose 0, cond_var will remain 'true', ending the loop
            #pragma omp atomic update
            cond_var &= some_private_var;

            #pragma omp barrier
        } while(!cond_var);

        #pragma omp critical
        std::cout << "Thread " << tid << " finished with " << iter_count << " iterations." << std::endl;
    }

    return 0;
}

You can do a little bit better by having every thread set only a local variable in a shared array, and having one single thread do the and-ing; 通过让每个线程在共享数组中仅设置一个局部变量,然后让一个线程来执行与操作,您可以做得更好。 so you still need two barriers, one to make sure everyone is done before the anding, and one to make sure the anding is done before testing for completion: 因此,您仍然需要两个障碍,一个是确保每个人都在完成安定操作之前完成,另一个是要确保在完成测试前完成安定操作:

#include <omp.h>
#include <random>
#include <chrono>
#include <thread>
#include <iostream>

int main() {

    bool cond_var;

    const int num_threads = omp_get_max_threads();
    const unsigned int spacing=64/sizeof(bool);  /* to avoid false sharing */
    bool local_cond_var[num_threads*spacing];

    #pragma omp parallel default(none) shared(cond_var,std::cout,local_cond_var)
    {
        std::random_device rd;
        std::mt19937 rng(rd());
        unsigned iter_count = 0;

        std::uniform_int_distribution<int> dice(0,1);

        const int tid = omp_get_thread_num();
        printf("Thread %d started.\n", tid);
        do {
            ++iter_count;

            local_cond_var[tid*spacing] = (dice(rng) == 0);

            #pragma omp barrier
            #pragma omp single
            {
                cond_var = true;
                for (int i=0; i<num_threads; i++)
                    cond_var &= local_cond_var[i*spacing];
            }
            // implicit barrier here after the single; turned off with a nowait clause.
        } while(!cond_var);

        #pragma omp critical
        std::cout << "Thread " << tid << " finished with " << iter_count << " iterations." << std::endl;
    }

    return 0;
}

Note that the barriers, explicit or implicit, imply flushing of the shared variables, and adding a nowait clause to the singles would then cause intermittent deadlocks. 请注意,无论是显式的还是隐式的,这些障碍都意味着冲刷共享变量,并向单身人士添加nowait子句会导致间歇性死锁。

Putting a #pragma omp barrier after the last statement in the body of the loop did not cause a deadlock for me, but it's not sufficient, either. 在循环体中的最后一条语句之后放置#pragma omp barrier并不会给我造成僵局,但这也不足够。 Although the worker threads will wait at the barrier until they can all pass through together, that does not ensure that they have a consistent view of cond_var on the other side. 尽管工作线程将在屏障处等待直到它们都可以一起通过,但这不能确保它们在另一端具有一致的cond_var视图。 If on any iteration the first thread(s) to update cond_var leaves it true , then some or all of those threads may perform another iteration despite another thread later setting it false . 如果在任何迭代中要更新cond_var的第一个线程cond_var将其cond_vartrue ,那么尽管另一个线程后来将其设置为false但这些线程中的某些或全部仍可以执行另一个迭代。 Only when those threads return to the atomic update are they certain to see the value written by other threads. 只有当这些线程返回原子更新时,他们才能确定看到其他线程写入的值。

You should be able to work around that issue by performing an atomic read of the condition variable after the barrier, before testing the loop condition. 在测试循环条件之前,您应该能够通过在屏障之后执行条件变量的原子读取来解决该问题。 You need to do that, or something else to solve the problem, because it violates an OpenMP constraint for different threads in your thread group to reach the barrier different numbers of times. 需要执行此操作或采取其他措施来解决问题,因为它违反了OpenMP约束,线程组中的不同线程到达该障碍的次数不同。 In fact, that is likely the reason for your program hanging: the threads that perform an extra iteration are stuck waiting for the others at the barrier. 实际上,这可能是程序挂起的原因:执行额外迭代的线程被卡住,等待其他线程进入障碍。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM