What is the correct way to implement thread barrier, and barrier resetting in C?

Question

I tried to implement a simple barrier in my code that looks like this:

void waitOnBarrier(int* barrier, int numberOfThreads) {
    atomicIncrement(barrier); // atomic increment implemented in assembly
    while(*barrier < numberOfThreads);
}

And then there is a barrier usage in the code:

int g_barrier = 0; // a global variable

waitOnBarrier(&g_barrier, someKnownNumberOfThreads);

So far so good, but where should I reset my g_barrier variable back to zero? If I write something like

g_barrier = 0;

right after the waitOnBarrier call, I will have a problem if one of the threads will be released faster than others from the barrier and nullify the g_barrier while all other threads are still performing the loop instructions, so eventually they will get stuck on the barrier forever.

Explanation: waitOnBarrier will compile into something like this (pseudocode):

1: mov rax, numberOfThreads
2: mov rbx, [barrier]
3: cmp rax, rbx
4: jmp(if smaller) to 2

So if we have 2 threads syncing on the barrier, and thread_1 being slow somewhere at instruction 3 or 4, while a faster thread_2 reaches the barrier, passes it and continues to the g_barrier nullification flow. Which means that after thread_1 will reach instruction 2 it will see a zero value at [barrier] and will stuck on the barrier forever!

The question is, how should I nullify the g_barrier , what place for it in the code is "far enough" that I can be sure that by that time all the threads left the barrier? Or is there more correct way to implement a barrier?

Answer 1

Barriers are actually quite difficult to implement, the main reason being that new waiters can begin arriving before all the old waiters have had a chance to execute, which precludes any kind of simple count based implementation. My preferred solution is to have the barrier object itself simply point to a "current barrier instance" that exists on the stack of the first thread arriving at the barrier, and which will also be the last thread to leave (since it cannot leave while other threads are still referencing its stack). A very nice sample implementation in terms of pthread primitives (which could be adapted to C11 locking primitives or whatever you have to work with) is included in Michael Burr's answer to my past question on the topic:

https://stackoverflow.com/a/5902671/379897

Yes it looks like a lot of work, but writing a barrier implementation that actually satisfies the contract of a barrier is non-trivial.

Answer 2

Try to implement the Barrier solution that is being explained in this book:

The Little Book of Semaphores

Answer 3

Do not reset your barrier variable back to zero.

When any of the thread is about to exit, atomically decrement the barrier variable by one.

Your barrier looks like you do not want the number of working threads spawned to fall below numberOfThreads .

Answer 4

I came across this question when trying to do something similar, so I thought I'd share my solution, in case someone else finds it useful. It's implemented in pure C++11 (sadly not C11, since the multithreading part of the standard is unsupported as of yet in gcc and msvc).

Basically, you maintain two counters, whose usage is alternated. Below is the implementation and a usage example:

    #include <cstdio>
    #include <thread>
    #include <condition_variable>

    // A barrier class; The barrier is initialized with the number of expected threads to synchronize
    class barrier_t{
        const size_t count;
        size_t counter[2], * currCounter;
        std::mutex mutex;
        std::condition_variable cv;

    public:
        barrier_t(size_t count) : count(count), currCounter(&counter[0]) {
            counter[0] = count;
            counter[1] = 0;
        }
        void wait(){
            std::unique_lock<std::mutex> lock(mutex);
            if (!--*currCounter){
                currCounter += currCounter == counter ? 1 : -1;
                *currCounter = count;
                cv.notify_all();
            }
            else {
                size_t * currCounter_local = currCounter;
                cv.wait(lock, [currCounter_local]{return *currCounter_local == 0; });
            }
        }
    };

    void testBarrier(size_t iters, size_t threadIdx, barrier_t *B){
        for(size_t i = 0; i < iters; i++){
            printf("Hello from thread %i for the %ith time!\n", threadIdx, i);
            B->wait();
        }
    }

    int main(void){
        const size_t threadCnt = 4, iters = 8;
        barrier_t B(threadCnt);
        std::thread t[threadCnt];   
        for(size_t i = 0; i < threadCnt; i++) t[i] = std::thread(testBarrier, iters, i, &B);
        for(size_t i = 0; i < threadCnt; i++) t[i].join();
        return 0;
    }

What is the correct way to implement thread barrier, and barrier resetting in C?

Question

4 answers

solution1
3 ACCPTED 2014-10-07 11:28:10

solution2
0 2014-10-07 09:10:10

solution3
0 2014-10-07 10:11:41

solution4
0 2015-03-18 23:11:22

What is the correct way to implement thread barrier, and barrier resetting in C?

Question

4 answers

solution1 3 ACCPTED 2014-10-07 11:28:10

solution2 0 2014-10-07 09:10:10

solution3 0 2014-10-07 10:11:41

solution4 0 2015-03-18 23:11:22

solution1
3 ACCPTED 2014-10-07 11:28:10

solution2
0 2014-10-07 09:10:10

solution3
0 2014-10-07 10:11:41

solution4
0 2015-03-18 23:11:22