使用c ++ 11 atomics编写（旋转）线程障碍

Question

I'm trying to familiarize myself with c++11 atomics, so I tried writing a barrier class for threads (before someone complains about not using existing classes: this is more for learning/self improvement than due to any real need). 我正在尝试熟悉c ++ 11原子，所以我尝试为线程编写一个屏障类（在有人抱怨不使用现有类之前：这更多是为了学习/自我改进，而不是由于任何实际需要）。 my class looks basically as followed: 我的班级基本上看起来如下：

class barrier
{
private:
    std::atomic<int> counter[2];
    std::atomic<int> lock[2];
    std::atomic<int> cur_idx;
    int thread_count;
public:
    //constructors...
    bool wait();
};

All members are initialized to zero, except thread_count, which holds the appropriate count. 所有成员都初始化为零，除了thread_count，它保存适当的计数。 I have implemented the wait function as 我已经将wait函数实现为

int idx  = cur_idx.load();
if(lock[idx].load() == 0)
{
    lock[idx].store(1);
}
int val = counter[idx].fetch_add(1);
if(val >= thread_count - 1)
{
    counter[idx].store(0);
    cur_idx.fetch_xor(1);
    lock[idx].store(0);
    return true;
}
while(lock[idx].load() == 1);
return false;

However when trying to use it with two threads ( thread_count is 2) whe first thread gets in the wait loop just fine, but the second thread doesn't unlock the barrier (it seems it doesn't even get to int val = counter[idx].fetch_add(1); , but I'm not too sure about that. However when I'm using gcc atomic-intrinsics by using volatile int instead of std::atomic<int> and writing wait as followed: 但是当尝试使用两个线程（ thread_count为2）时，第一个线程进入等待循环就好了，但第二个线程没有解锁屏障（似乎它甚至没有达到int val = counter[idx].fetch_add(1);但是我对此不太确定。但是当我使用gcc atomic-intrinsics时，使用volatile int而不是std::atomic<int>并编写wait如下：

int idx = cur_idx;
if(lock[idx] == 0)
{
    __sync_val_compare_and_swap(&lock[idx], 0, 1);
}
int val = __sync_fetch_and_add(&counter[idx], 1);
if(val >= thread_count - 1)
{
    __sync_synchronize();
    counter[idx] = 0;
    cur_idx ^= 1;
    __sync_synchronize();
    lock[idx] = 0;
    __sync_synchronize();
    return true;
}
while(lock[idx] == 1);
return false;

it works just fine. 它工作得很好。 From my understanding there shouldn't be any fundamental differences between the two versions (more to the point if anything the second should be less likely to work). 根据我的理解，两个版本之间不应该有任何根本的区别（如果第二个应该不太可能工作的话，那就更多了）。 So which of the following scenarios applies? 那么以下哪种情况适用？

I got lucky with the second implementation and my algorithm is crap 第二次实现我很幸运，我的算法很糟糕
I didn't fully understand std::atomic and there is a problem with the first variant (but not the second) 我没有完全理解std::atomic并且第一个变体存在问题（但不是第二个）
It should work, but the experimental implementation for c++11 libraries isn't as mature as I have hoped 它应该工作，但c ++ 11库的实验性实现并不像我希望的那样成熟

For the record I'm using 32bit mingw with gcc 4.6.1 为了记录，我使用32位mingw和gcc 4.6.1

The calling code looks like this: 调用代码如下所示：

spin_barrier b(2);
std::thread t([&b]()->void
{
    std::this_thread::sleep_for(std::chrono::duration<double>(0.1));
    b.wait();
});
b.wait();
t.join();

Since mingw doesn't whave <thread> headers jet I use a self written version for that which basically wraps the appropriate pthread functions (before someone asks: yes it works without the barrier, so it shouldn't be a problem with the wrapping) Any insights would be appreciated. 由于mingw没有<thread> headers jet，我使用自编写的版本，它基本上包装了适当的pthread函数（在有人问之前：是的，它没有屏障，所以它不应该是包装的问题）任何见解将不胜感激。

edit: Explanation for the algorithm to make it clearer: 编辑：算法说明使其更清晰：

thread_count is the number of threads which shall wait for the barrier (so if thread_count threads are in the barrier all can leave the barrier). thread_count是等待屏障的线程数（因此如果thread_count线程在屏障中，则所有线程都可以离开屏障）。
lock is set to one when the first (or any) thread enters the barrier. 当第一个（或任何）线程进入屏障时， lock设置为1。
counter counts how many threads are inside the barrier and is atomically incremented once for each thread counter计算屏障内有多少个线程，并为每个线程以原子方式递增一次
if counter>=thread_count all threads are inside the barrier so counter and lock are reset to zero if counter>=thread_count所有线程都在屏障内，那么计数器和锁定将重置为零
otherwise the thread waits for the lock to become zero 否则线程等待lock变为零
in the next use of the barrier different variables ( counter , lock ) are used ensure there are no problems if threads are still waiting on the first use of the barrier (eg they had been preempted when the barrier is lifted) 在屏障的下一次使用中，使用了不同的变量（ counter ， lock ），确保如果线程仍在等待第一次使用屏障时没有问题（例如，当屏障被抬起时它们已被抢占）

edit2: I have now tested it using gcc 4.5.1 under linux, where both versions seem to work just fine, which seems to point to a problem with mingw's std::atomic , but I'm still not completely convinced, since looking into the <atomic> header revaled that most functions simply call the appropriate gcc-atomic meaning there really shouldn't bea difference between the two versions edit2：我现在已经在linux下使用gcc 4.5.1进行了测试，其中两个版本似乎工作得很好，这似乎指出了mingw的std::atomic一个问题，但是我仍然没有完全相信， <atomic>标题重申大多数函数只调用适当的gcc-atomic意味着两个版本之间确实不应该有区别

Answer 1

I have no idea if this is going to be of help, but the following snippet from Herb Sutter's implementation of a concurrent queue uses a spinlock based on atomics: 我不知道这是否会有所帮助，但Herb Sutter执行并发队列的以下片段使用了基于原子的自旋锁：

std::atomic<bool> consumerLock;

{   // the critical section
    while (consumerLock.exchange(true)) { }  // this is the spinlock

    // do something useful

    consumerLock = false;  // unlock
}

In fact, the Standard provides a purpose-built type for this construction that is required to have lock-free operations, std::atomic_flag . 事实上，标准为这种构造提供了一种特制的类型，它需要具有无锁操作std::atomic_flag 。 With that, the critical section would look like this: 有了这个，关键部分看起来像这样：

std::atomic_flag consumerLock;

{
    // critical section

    while (consumerLock.test_and_set()) { /* spin */ }

    // do stuff

    consumerLock.clear();
}

(You can use acquire and release memory ordering there if you prefer.) （如果您愿意，可以使用获取和释放内存排序。）

Answer 2

Here is an elegant solution from the book C++ Concurrency in Action: Practical Multithreading . 以下是C ++ Concurrency in Action：Practical Multithreading中的优雅解决方案。

struct bar_t {
    unsigned const count;
    std::atomic<unsigned> spaces;
    std::atomic<unsigned> generation;
    bar_t(unsigned count_) :
        count(count_), spaces(count_), generation(0)
    {}
    void wait() {
        unsigned const my_generation = generation;
        if (!--spaces) {
            spaces = count;
            ++generation;
        } else {
            while(generation == my_generation);
        }
    }
};

Answer 3

It looks needlessly complicated. 它看起来不必要复杂。 Try this simpler version (well, I haven't tested it, I just meditated on it:))) : 尝试这个更简单的版本（好吧，我没有测试过，我只是在它上面思考:)））：

#include <atomic>

class spinning_barrier
{
public:
    spinning_barrier (unsigned int n) : n_ (n), nwait_ (0), step_(0) {}

    bool wait ()
    {
        unsigned int step = step_.load ();

        if (nwait_.fetch_add (1) == n_ - 1)
        {
            /* OK, last thread to come.  */
            nwait_.store (0); // XXX: maybe can use relaxed ordering here ??
            step_.fetch_add (1);
            return true;
        }
        else
        {
            /* Run in circles and scream like a little girl.  */
            while (step_.load () == step)
                ;
            return false;
        }
    }

protected:
    /* Number of synchronized threads. */
    const unsigned int n_;

    /* Number of threads currently spinning.  */
    std::atomic<unsigned int> nwait_;

    /* Number of barrier syncronizations completed so far, 
     * it's OK to wrap.  */
    std::atomic<unsigned int> step_;
};

EDIT: @Grizzy, I can't find any errors in your first (C++11) version and I've also run it for like a hundred million syncs with two threads and it completes. 编辑： @Grizzy，我在你的第一个（C ++ 11）版本中找不到任何错误，我也运行它，就像有两个线程的一亿个同步一样，它就完成了。 I've run it on a dual-socket/quad-core GNU/Linux machine though, so I'm rather inclined to suspect your option 3. - the library (or rather, its port to win32) is not mature enough. 我在双插槽/四核GNU / Linux机器上运行它，所以我更倾向于怀疑你的选项3. - 库（或者更确切地说，它的端口到win32）还不够成熟。

Answer 4

Here is a simple version of mine : 这是我的一个简单版本：

// spinning_mutex.hpp
#include <atomic>


class spinning_mutex
{
private:
    std::atomic<bool> lockVal;
public:
    spinning_mutex() : lockVal(false) { };

    void lock()
    {
        while(lockVal.exchange(true) );
    } 

    void unlock()
    {
        lockVal.store(false);
    }

    bool is_locked()
    {
        return lockVal.load();
    }
};

Usage : (from std::lock_guard example) 用法:(来自std :: lock_guard示例）

#include <thread>
#include <mutex>
#include "spinning_mutex.hpp"

int g_i = 0;
spinning_mutex g_i_mutex;  // protects g_i

void safe_increment()
{
    std::lock_guard<spinning_mutex> lock(g_i_mutex);
    ++g_i;

    // g_i_mutex is automatically released when lock
    // goes out of scope
}

int main()
{
    std::thread t1(safe_increment);
    std::thread t2(safe_increment);

    t1.join();
    t2.join();
}

Answer 5

I know the thread is a little bit old, but since it is still the first google result when searching for a thread barrier using c++11 only, I want to present a solution that gets rid of the busy waiting using the std::condition_variable . 我知道线程有点旧，但由于它仍然是第一个使用c ++ 11搜索线程障碍时的google结果，我想提出一个解决方案，摆脱繁忙的等待使用std::condition_variable 。 Basically it is the solution of chill, but instead of the while loop it is using std::conditional_variable.wait() and std::conditional_variable.notify_all() . 基本上它是chill的解决方案，但它使用std::conditional_variable.wait()和std::conditional_variable.notify_all()而不是while循环。 In my tests it seems to work fine. 在我的测试中它似乎工作正常。

#include <atomic>
#include <condition_variable>
#include <mutex>


class SpinningBarrier
{
    public:
        SpinningBarrier (unsigned int threadCount) :
            threadCnt(threadCount),
            step(0),
            waitCnt(0)
        {}

        bool wait()
        {
            if(waitCnt.fetch_add(1) >= threadCnt - 1)
            {
                std::lock_guard<std::mutex> lock(mutex);
                step += 1;

                condVar.notify_all();
                waitCnt.store(0);
                return true;
            }
            else
            {
                std::unique_lock<std::mutex> lock(mutex);
                unsigned char s = step;

                condVar.wait(lock, [&]{return step == s;});
                return false;
            }
        }
    private:
        const unsigned int threadCnt;
        unsigned char step;

        std::atomic<unsigned int> waitCnt;
        std::condition_variable condVar;
        std::mutex mutex;
};

Answer 6

Why not use std::atomic_flag (from C++11)? 为什么不使用std :: atomic_flag（来自C ++ 11）？

http://en.cppreference.com/w/cpp/atomic/atomic_flag http://en.cppreference.com/w/cpp/atomic/atomic_flag

std::atomic_flag is an atomic boolean type. std :: atomic_flag是一种原子布尔类型。 Unlike all specializations of std::atomic, it is guaranteed to be lock-free. 与std :: atomic的所有特化不同，它保证是无锁的。

Here's how I would write my spinning thread barrier class: 这是我写旋转线程障碍类的方法：

#ifndef SPINLOCK_H
#define SPINLOCK_H

#include <atomic>
#include <thread>

class SpinLock
{
public:

    inline SpinLock() :
        m_lock(ATOMIC_FLAG_INIT)
    {
    }

    inline SpinLock(const SpinLock &) :
        m_lock(ATOMIC_FLAG_INIT)
    {
    }

    inline SpinLock &operator=(const SpinLock &)
    {
        return *this;
    }

    inline void lock()
    {
        while (true)
        {
            for (int32_t i = 0; i < 10000; ++i)
            {
                if (!m_lock.test_and_set(std::memory_order_acquire))
                {
                    return;
                }
            }

            std::this_thread::yield();  // A great idea that you don't see in many spinlock examples
        }
    }

    inline bool try_lock()
    {
        return !m_lock.test_and_set(std::memory_order_acquire);
    }

    inline void unlock()
    {
        m_lock.clear(std::memory_order_release);
    }

private:

    std::atomic_flag m_lock;
};

#endif

Answer 7

Stolen straight from docs 直接从文档中被盗

spinlock.h spinlock.h

#include <atomic>

using namespace std;

/* Fast userspace spinlock */
class spinlock {
public:
    spinlock(std::atomic_flag& flag) : flag(flag) {
        while (flag.test_and_set(std::memory_order_acquire)) ;
    };
    ~spinlock() {
        flag.clear(std::memory_order_release);
    };
private:
    std::atomic_flag& flag; 
};

usage.cpp usage.cpp

#include "spinlock.h"

atomic_flag kartuliga = ATOMIC_FLAG_INIT;

void mutually_exclusive_function()
{
    spinlock lock(kartuliga);
    /* your shared-resource-using code here */
}

使用c ++ 11 atomics编写（旋转）线程障碍

问题描述

7 个解决方案

解决方案1
23 2011-11-13 22:49:19

解决方案2
5 2014-07-16 09:31:45

解决方案3
5 已采纳 2011-11-14 11:10:47

解决方案4
4 2011-11-16 12:15:03

解决方案5
2 2013-01-16 14:07:08

解决方案6
2 2014-08-13 12:52:41

解决方案7
1 2015-07-09 21:20:17

spinlock.h spinlock.h

usage.cpp usage.cpp

使用c ++ 11 atomics编写（旋转）线程障碍

问题描述

7 个解决方案

解决方案1 23 2011-11-13 22:49:19

解决方案2 5 2014-07-16 09:31:45

解决方案3 5 已采纳 2011-11-14 11:10:47

解决方案4 4 2011-11-16 12:15:03

解决方案5 2 2013-01-16 14:07:08

解决方案6 2 2014-08-13 12:52:41

解决方案7 1 2015-07-09 21:20:17

spinlock.h spinlock.h

usage.cpp usage.cpp

解决方案1
23 2011-11-13 22:49:19

解决方案2
5 2014-07-16 09:31:45

解决方案3
5 已采纳 2011-11-14 11:10:47

解决方案4
4 2011-11-16 12:15:03

解决方案5
2 2013-01-16 14:07:08

解决方案6
2 2014-08-13 12:52:41

解决方案7
1 2015-07-09 21:20:17