Writing a (spinning) thread barrier using c++11 atomics

Question

I'm trying to familiarize myself with c++11 atomics, so I tried writing a barrier class for threads (before someone complains about not using existing classes: this is more for learning/self improvement than due to any real need). my class looks basically as followed:

class barrier
{
private:
    std::atomic<int> counter[2];
    std::atomic<int> lock[2];
    std::atomic<int> cur_idx;
    int thread_count;
public:
    //constructors...
    bool wait();
};

All members are initialized to zero, except thread_count, which holds the appropriate count. I have implemented the wait function as

int idx  = cur_idx.load();
if(lock[idx].load() == 0)
{
    lock[idx].store(1);
}
int val = counter[idx].fetch_add(1);
if(val >= thread_count - 1)
{
    counter[idx].store(0);
    cur_idx.fetch_xor(1);
    lock[idx].store(0);
    return true;
}
while(lock[idx].load() == 1);
return false;

However when trying to use it with two threads ( thread_count is 2) whe first thread gets in the wait loop just fine, but the second thread doesn't unlock the barrier (it seems it doesn't even get to int val = counter[idx].fetch_add(1); , but I'm not too sure about that. However when I'm using gcc atomic-intrinsics by using volatile int instead of std::atomic<int> and writing wait as followed:

int idx = cur_idx;
if(lock[idx] == 0)
{
    __sync_val_compare_and_swap(&lock[idx], 0, 1);
}
int val = __sync_fetch_and_add(&counter[idx], 1);
if(val >= thread_count - 1)
{
    __sync_synchronize();
    counter[idx] = 0;
    cur_idx ^= 1;
    __sync_synchronize();
    lock[idx] = 0;
    __sync_synchronize();
    return true;
}
while(lock[idx] == 1);
return false;

it works just fine. From my understanding there shouldn't be any fundamental differences between the two versions (more to the point if anything the second should be less likely to work). So which of the following scenarios applies?

I got lucky with the second implementation and my algorithm is crap
I didn't fully understand std::atomic and there is a problem with the first variant (but not the second)
It should work, but the experimental implementation for c++11 libraries isn't as mature as I have hoped

For the record I'm using 32bit mingw with gcc 4.6.1

The calling code looks like this:

spin_barrier b(2);
std::thread t([&b]()->void
{
    std::this_thread::sleep_for(std::chrono::duration<double>(0.1));
    b.wait();
});
b.wait();
t.join();

Since mingw doesn't whave <thread> headers jet I use a self written version for that which basically wraps the appropriate pthread functions (before someone asks: yes it works without the barrier, so it shouldn't be a problem with the wrapping) Any insights would be appreciated.

edit: Explanation for the algorithm to make it clearer:

thread_count is the number of threads which shall wait for the barrier (so if thread_count threads are in the barrier all can leave the barrier).
lock is set to one when the first (or any) thread enters the barrier.
counter counts how many threads are inside the barrier and is atomically incremented once for each thread
if counter>=thread_count all threads are inside the barrier so counter and lock are reset to zero
otherwise the thread waits for the lock to become zero
in the next use of the barrier different variables ( counter , lock ) are used ensure there are no problems if threads are still waiting on the first use of the barrier (eg they had been preempted when the barrier is lifted)

edit2: I have now tested it using gcc 4.5.1 under linux, where both versions seem to work just fine, which seems to point to a problem with mingw's std::atomic , but I'm still not completely convinced, since looking into the <atomic> header revaled that most functions simply call the appropriate gcc-atomic meaning there really shouldn't bea difference between the two versions

Answer 1

I have no idea if this is going to be of help, but the following snippet from Herb Sutter's implementation of a concurrent queue uses a spinlock based on atomics:

std::atomic<bool> consumerLock;

{   // the critical section
    while (consumerLock.exchange(true)) { }  // this is the spinlock

    // do something useful

    consumerLock = false;  // unlock
}

In fact, the Standard provides a purpose-built type for this construction that is required to have lock-free operations, std::atomic_flag . With that, the critical section would look like this:

std::atomic_flag consumerLock;

{
    // critical section

    while (consumerLock.test_and_set()) { /* spin */ }

    // do stuff

    consumerLock.clear();
}

(You can use acquire and release memory ordering there if you prefer.)

Answer 2

Here is an elegant solution from the book C++ Concurrency in Action: Practical Multithreading .

struct bar_t {
    unsigned const count;
    std::atomic<unsigned> spaces;
    std::atomic<unsigned> generation;
    bar_t(unsigned count_) :
        count(count_), spaces(count_), generation(0)
    {}
    void wait() {
        unsigned const my_generation = generation;
        if (!--spaces) {
            spaces = count;
            ++generation;
        } else {
            while(generation == my_generation);
        }
    }
};

Answer 3

It looks needlessly complicated. Try this simpler version (well, I haven't tested it, I just meditated on it:))) :

#include <atomic>

class spinning_barrier
{
public:
    spinning_barrier (unsigned int n) : n_ (n), nwait_ (0), step_(0) {}

    bool wait ()
    {
        unsigned int step = step_.load ();

        if (nwait_.fetch_add (1) == n_ - 1)
        {
            /* OK, last thread to come.  */
            nwait_.store (0); // XXX: maybe can use relaxed ordering here ??
            step_.fetch_add (1);
            return true;
        }
        else
        {
            /* Run in circles and scream like a little girl.  */
            while (step_.load () == step)
                ;
            return false;
        }
    }

protected:
    /* Number of synchronized threads. */
    const unsigned int n_;

    /* Number of threads currently spinning.  */
    std::atomic<unsigned int> nwait_;

    /* Number of barrier syncronizations completed so far, 
     * it's OK to wrap.  */
    std::atomic<unsigned int> step_;
};

EDIT: @Grizzy, I can't find any errors in your first (C++11) version and I've also run it for like a hundred million syncs with two threads and it completes. I've run it on a dual-socket/quad-core GNU/Linux machine though, so I'm rather inclined to suspect your option 3. - the library (or rather, its port to win32) is not mature enough.

Answer 4

Here is a simple version of mine :

// spinning_mutex.hpp
#include <atomic>


class spinning_mutex
{
private:
    std::atomic<bool> lockVal;
public:
    spinning_mutex() : lockVal(false) { };

    void lock()
    {
        while(lockVal.exchange(true) );
    } 

    void unlock()
    {
        lockVal.store(false);
    }

    bool is_locked()
    {
        return lockVal.load();
    }
};

Usage : (from std::lock_guard example)

#include <thread>
#include <mutex>
#include "spinning_mutex.hpp"

int g_i = 0;
spinning_mutex g_i_mutex;  // protects g_i

void safe_increment()
{
    std::lock_guard<spinning_mutex> lock(g_i_mutex);
    ++g_i;

    // g_i_mutex is automatically released when lock
    // goes out of scope
}

int main()
{
    std::thread t1(safe_increment);
    std::thread t2(safe_increment);

    t1.join();
    t2.join();
}

Answer 5

I know the thread is a little bit old, but since it is still the first google result when searching for a thread barrier using c++11 only, I want to present a solution that gets rid of the busy waiting using the std::condition_variable . Basically it is the solution of chill, but instead of the while loop it is using std::conditional_variable.wait() and std::conditional_variable.notify_all() . In my tests it seems to work fine.

#include <atomic>
#include <condition_variable>
#include <mutex>


class SpinningBarrier
{
    public:
        SpinningBarrier (unsigned int threadCount) :
            threadCnt(threadCount),
            step(0),
            waitCnt(0)
        {}

        bool wait()
        {
            if(waitCnt.fetch_add(1) >= threadCnt - 1)
            {
                std::lock_guard<std::mutex> lock(mutex);
                step += 1;

                condVar.notify_all();
                waitCnt.store(0);
                return true;
            }
            else
            {
                std::unique_lock<std::mutex> lock(mutex);
                unsigned char s = step;

                condVar.wait(lock, [&]{return step == s;});
                return false;
            }
        }
    private:
        const unsigned int threadCnt;
        unsigned char step;

        std::atomic<unsigned int> waitCnt;
        std::condition_variable condVar;
        std::mutex mutex;
};

Answer 6

Why not use std::atomic_flag (from C++11)?

http://en.cppreference.com/w/cpp/atomic/atomic_flag

std::atomic_flag is an atomic boolean type. Unlike all specializations of std::atomic, it is guaranteed to be lock-free.

Here's how I would write my spinning thread barrier class:

#ifndef SPINLOCK_H
#define SPINLOCK_H

#include <atomic>
#include <thread>

class SpinLock
{
public:

    inline SpinLock() :
        m_lock(ATOMIC_FLAG_INIT)
    {
    }

    inline SpinLock(const SpinLock &) :
        m_lock(ATOMIC_FLAG_INIT)
    {
    }

    inline SpinLock &operator=(const SpinLock &)
    {
        return *this;
    }

    inline void lock()
    {
        while (true)
        {
            for (int32_t i = 0; i < 10000; ++i)
            {
                if (!m_lock.test_and_set(std::memory_order_acquire))
                {
                    return;
                }
            }

            std::this_thread::yield();  // A great idea that you don't see in many spinlock examples
        }
    }

    inline bool try_lock()
    {
        return !m_lock.test_and_set(std::memory_order_acquire);
    }

    inline void unlock()
    {
        m_lock.clear(std::memory_order_release);
    }

private:

    std::atomic_flag m_lock;
};

#endif

Answer 7

Stolen straight from docs

spinlock.h

#include <atomic>

using namespace std;

/* Fast userspace spinlock */
class spinlock {
public:
    spinlock(std::atomic_flag& flag) : flag(flag) {
        while (flag.test_and_set(std::memory_order_acquire)) ;
    };
    ~spinlock() {
        flag.clear(std::memory_order_release);
    };
private:
    std::atomic_flag& flag; 
};

usage.cpp

#include "spinlock.h"

atomic_flag kartuliga = ATOMIC_FLAG_INIT;

void mutually_exclusive_function()
{
    spinlock lock(kartuliga);
    /* your shared-resource-using code here */
}

Writing a (spinning) thread barrier using c++11 atomics

Question

7 answers

solution1
23 2011-11-13 22:49:19

solution2
5 2014-07-16 09:31:45

solution3
5 ACCPTED 2011-11-14 11:10:47

solution4
4 2011-11-16 12:15:03

solution5
2 2013-01-16 14:07:08

solution6
2 2014-08-13 12:52:41

solution7
1 2015-07-09 21:20:17

spinlock.h

usage.cpp

Writing a (spinning) thread barrier using c++11 atomics

Question

7 answers

solution1 23 2011-11-13 22:49:19

solution2 5 2014-07-16 09:31:45

solution3 5 ACCPTED 2011-11-14 11:10:47

solution4 4 2011-11-16 12:15:03

solution5 2 2013-01-16 14:07:08

solution6 2 2014-08-13 12:52:41

solution7 1 2015-07-09 21:20:17

spinlock.h

usage.cpp

solution1
23 2011-11-13 22:49:19

solution2
5 2014-07-16 09:31:45

solution3
5 ACCPTED 2011-11-14 11:10:47

solution4
4 2011-11-16 12:15:03

solution5
2 2013-01-16 14:07:08

solution6
2 2014-08-13 12:52:41

solution7
1 2015-07-09 21:20:17