How to synchronize the value of a variable among all threads?

Question

If I have the following case:

bool cond_var;

#pragma omp parallel shared(cond_var)
{
    bool some_private_var;
    // ...
    do {
       #pragma omp single
       {
           cond_var = true;
       }

       // do something, calculate some_private_var;
       // ...

       #pragma omp atomic update
       cond_var &= some_private_var;

       // Syncing step
       // (???)

    } while(cond_var);

    // ... (other parallel stuff)
}

I want my do-while loop to have the same number of iterations for all my threads, but when I tried #pragma omp barrier as the syncing step (just before the end of the loop), I have ended with a deadlock. Printing the value of cond_var showed that some threads saw it as true while others saw it as false , so the loop finished for some, leaving the others deadlocked on the barrier. Then I have tried various combinations and ordering of barrier and flush , with no luck (with some combinations, the deadlock was postponed).

How to properly combine and sync the loop condition among the threads so the all the loops will have the same number of iterations?

UPDATE

I have also tried loading the value of cond_var to another private variable with #pragma atomic read , and testing that condition. It also didn't work. Apparently, atomic read guarantee I have a consistent value (either old or new), but doesn't guarantee it is the latest.

UPDATE 2

Based on code Jonathan Dursi's code, this is an MVCE that look more like what I am trying to do:

#include <omp.h>
#include <cstdio>
#include <random>
#include <chrono>
#include <thread>

int main() {

    bool cond_var;
    const int nthreads = omp_get_max_threads();

    #pragma omp parallel default(none) shared(cond_var)
    {
        bool some_private_var;
        std::random_device rd;
        std::mt19937 rng(rd());
        unsigned iter_count = 0;

        /* chance of having to end: 1 in 6**nthreads; all threads must choose 0 */
        std::uniform_int_distribution<int> dice(0,5);

        const int tid = omp_get_thread_num();
        printf("Thread %d started.\n", tid);
        do {
            ++iter_count;

            #pragma omp once shared(cond_var)
            {
                // cond_var must be reset to 'true' because it is the
                // neutral element of &
                // For the loop to end, all threads must choose the
                // same random value 0
                cond_var = true;
            }

            some_private_var = (dice(rng) == 0);

            // If all threads choose 0, cond_var will remain 'true', ending the loop
            #pragma omp atomic update
            cond_var &= some_private_var;

            #pragma omp barrier
        } while(!cond_var);
        printf("Thread %d finished with %u iterations.\n", tid, iter_count);
    }

    return 0;
}

Running with 8 threads in a machine with enough logical cores to run all of them simultaneously, most runs deadlock in the first iteration, although there was one run that finished correctly on the second iteration (not conforming with the chances of 1 in 1679616 (6**8) of having all threads choosing 0).

Answer 1

The issue is that in the while loop, you are updating the cond_var twice and using it a third time, and you need to ensure that these operations don't interfere with each other. Each loop iteration, the code:

sets cond_var = true (using a non-existant OpenMP pragma, "once", which is ignored and so done by every thread)
Updates cond_var by &ing it with the local condition variable;
Uses the updated-by-everyone cond_var to test whether to exit out of the loop.

Thus, one needs to make sure one thread isn't setting cond_var true (1) while other threads are anding it (2); no threads are still anding (2) while using it to test out of the loop (3); and no threads are testing it (3) while a thread is setting it to true (1).

The obvious way to do that is with barriers, one between each of those three cases - so three barriers. So this works:

#include <omp.h>
#include <random>
#include <chrono>
#include <thread>
#include <iostream>

int main() {

    bool cond_var;

    #pragma omp parallel default(none) shared(cond_var,std::cout)
    {
        bool some_private_var;
        std::random_device rd;
        std::mt19937 rng(rd());
        unsigned iter_count = 0;

        std::uniform_int_distribution<int> dice(0,1);

        const int tid = omp_get_thread_num();
        printf("Thread %d started.\n", tid);
        do {
            ++iter_count;

            #pragma omp barrier
            #pragma omp single 
            cond_var = true;
            // implicit barrier here after the single; turned off with a nowait clause.

            some_private_var = (dice(rng) == 0);

            // If all threads choose 0, cond_var will remain 'true', ending the loop
            #pragma omp atomic update
            cond_var &= some_private_var;

            #pragma omp barrier
        } while(!cond_var);

        #pragma omp critical
        std::cout << "Thread " << tid << " finished with " << iter_count << " iterations." << std::endl;
    }

    return 0;
}

You can do a little bit better by having every thread set only a local variable in a shared array, and having one single thread do the and-ing; so you still need two barriers, one to make sure everyone is done before the anding, and one to make sure the anding is done before testing for completion:

#include <omp.h>
#include <random>
#include <chrono>
#include <thread>
#include <iostream>

int main() {

    bool cond_var;

    const int num_threads = omp_get_max_threads();
    const unsigned int spacing=64/sizeof(bool);  /* to avoid false sharing */
    bool local_cond_var[num_threads*spacing];

    #pragma omp parallel default(none) shared(cond_var,std::cout,local_cond_var)
    {
        std::random_device rd;
        std::mt19937 rng(rd());
        unsigned iter_count = 0;

        std::uniform_int_distribution<int> dice(0,1);

        const int tid = omp_get_thread_num();
        printf("Thread %d started.\n", tid);
        do {
            ++iter_count;

            local_cond_var[tid*spacing] = (dice(rng) == 0);

            #pragma omp barrier
            #pragma omp single
            {
                cond_var = true;
                for (int i=0; i<num_threads; i++)
                    cond_var &= local_cond_var[i*spacing];
            }
            // implicit barrier here after the single; turned off with a nowait clause.
        } while(!cond_var);

        #pragma omp critical
        std::cout << "Thread " << tid << " finished with " << iter_count << " iterations." << std::endl;
    }

    return 0;
}

Note that the barriers, explicit or implicit, imply flushing of the shared variables, and adding a nowait clause to the singles would then cause intermittent deadlocks.

Answer 2

Putting a #pragma omp barrier after the last statement in the body of the loop did not cause a deadlock for me, but it's not sufficient, either. Although the worker threads will wait at the barrier until they can all pass through together, that does not ensure that they have a consistent view of cond_var on the other side. If on any iteration the first thread(s) to update cond_var leaves it true , then some or all of those threads may perform another iteration despite another thread later setting it false . Only when those threads return to the atomic update are they certain to see the value written by other threads.

You should be able to work around that issue by performing an atomic read of the condition variable after the barrier, before testing the loop condition. You need to do that, or something else to solve the problem, because it violates an OpenMP constraint for different threads in your thread group to reach the barrier different numbers of times. In fact, that is likely the reason for your program hanging: the threads that perform an extra iteration are stuck waiting for the others at the barrier.

How to synchronize the value of a variable among all threads?

Question

2 answers

solution1
1 ACCPTED 2014-10-10 21:49:42

solution2
0 2014-10-10 19:17:18

How to synchronize the value of a variable among all threads?

Question

2 answers

solution1 1 ACCPTED 2014-10-10 21:49:42

solution2 0 2014-10-10 19:17:18

solution1
1 ACCPTED 2014-10-10 21:49:42

solution2
0 2014-10-10 19:17:18