C++ 在“线程屏障”同步模式上正确的原子内存排序

Question

I need to properly synchronize access to some shared resource between a predefined number of worker threads (statically known via application config) and a predefined number of control-plane threads.我需要在预定义数量的工作线程（通过应用程序配置静态已知）和预定义数量的控制平面线程之间正确同步对某些共享资源的访问。 The control-plane threads receive requests from the outside, and based on that potentially modify the shared resource.控制平面线程接收来自外部的请求，并在此基础上修改共享资源。 Worker threads simply run an infinite loop inside of which the shared resource is read only.工作线程只是运行一个无限循环，其中共享资源是只读的。 To do this in a thread-safe way, and given the actual application use-case (network packet processing, multi data-plane threads and multi control-plane threads), it was decided to implement a "thread barrier" kind of pattern.为了以线程安全的方式执行此操作，并考虑到实际的应用程序用例（网络数据包处理、多数据平面线程和多控制平面线程），决定实现“线程屏障”类型的模式。 Here's a snippet for how it's done, assuming the application is configured to spawn 2 worker threads and 2 control-plane threads:这是如何完成的片段，假设应用程序配置为生成 2 个工作线程和 2 个控制平面线程：

std::atomic_bool barrier{};
std::atomic_uint32_t workers_at_barrier{};

// called by control-plane threads only!
void barrier_lock()
{
    // optimized spinlock implementation: rigtorp.se/spinlock/
    while (true)
    {
        if (!barrier.exchange(true, std::memory_order_acquire))
            break;

        while (barrier.load(std::memory_order_relaxed))
            __builtin_ia32_pause();
    }
    assert(barrier);

    // wait for ALL worker (data-plane) threads to arrive at the barrier!
    while (workers_at_barrier.load() != 2);
    assert(workers_at_barrier.load() == 2);
}

// called by control-plane threads only!
void barrier_unlock()
{
    assert(barrier && workers_at_barrier.load() == 2);
    barrier.store(false, std::memory_order_release);

    // wait for ALL workers to get out of the barrier!
    while (workers_at_barrier.load() != 0);
}

struct barrier_lock_guard
{
    barrier_lock_guard()
    {
        barrier_lock();
    }

    ~barrier_lock_guard()
    {
        barrier_unlock();
    }
};

// control-plane threads receive some requests and handles them here
void handle_stuff()
{
    // ... stuff

    {
        barrier_lock_guard blg;

        // barrier should be set and all workers (2 in this case) should be waiting at the barrier for its release
        assert(barrier && workers_at_barrier.load() == 2);

        // ... writes to shared resource
    }

    // ... stuff
}

// called by worker threads only!
void wait_at_barrier()
{
    // immediately return if barrier is not set
    if (!barrier.load(std::memory_order_acquire))
        return;
    
    ++workers_at_barrier;

    // block at the barrier until it gets released
    while (barrier.load(std::memory_order_acquire));

    --workers_at_barrier;
}

// function run by the worker threads
void workers_stuff()
{
    while (true)
    {
        wait_at_barrier();

        // ... reads from shared resource
    }
}

The problem is that the assert assert(barrier && workers_at_barrier.load() == 2);问题是断言assert(barrier && workers_at_barrier.load() == 2); in handle_stuff() is getting hit.在handle_stuff()中被击中。 This occurs very very rarely, so there must be something wrong, and I'm trying to understand exactly what and where.这种情况很少发生，所以一定有问题，我试图准确地了解什么和在哪里。 Pretty sure though it has something to do with an incorrect use of std::memory_order .可以肯定的是，这与错误使用std::memory_order 。 Any C++ atomics pro out there that can point me to the exact issue and what the proper fix would be?那里有任何 C++ atomics pro 可以指出确切的问题以及正确的解决方法是什么？ Thanks in advance.提前致谢。

Answer 1

This is not a memory ordering issue, just a plain race.这不是内存排序问题，只是一场普通的比赛。 I can reproduce it even after upgrading all the memory orderings to sequential consistency.即使将所有内存排序升级为顺序一致性，我也可以重现它。 Here is my version on godbolt though I can only reproduce the failure locally (godbolt only runs on one core).这是我在 Godbolt 上的版本，尽管我只能在本地重现故障（godbolt 只能在一个核心上运行）。

The comment wait for ALL workers to get out of the barrier!评论wait for ALL workers to get out of the barrier! in barrier_unlock seems to point to the problem.在barrier_unlock似乎指向了问题所在。 This loop doesn't force another control thread to wait;这个循环不会强制另一个控制线程等待； that other thread could take the barrier right away.其他线程可以立即采取障碍。

Alternatively, observing the value workers_at_barrier == 2 in barrier_lock() does not prove that both threads are now waiting at the barrier;或者，在barrier_lock()中观察值workers_at_barrier == 2并不能证明两个线程现在都在屏障处等待； they may have already passed it while it was previously down, but not yet gotten around to decrementing the atomic counter.他们可能已经在它之前下降时通过了它，但还没有开始减少原子计数器。

So imagine the following sequence of events.所以想象一下下面的事件序列。 We have control threads C1,C2, and worker threads W1,W2.我们有控制线程 C1、C2 和工作线程 W1、W2。 C1 has taken the barrier and is just entering barrier_unlock() . C1 已经采取了障碍并且刚刚进入barrier_unlock() 。 C2 is just entering barrier_lock() . C2 刚刚进入barrier_lock() 。 W1 and W2 are both spinning in the while(barrier.load()) in wait_at_barrier() , and workers_at_barrier has the value 2 . W1 和 W2 都在 wait_at_barrier() 的 while( wait_at_barrier() while(barrier.load())中旋转， workers_at_barrier的值为2 。

C1: barrier.store(false) C1： barrier.store(false)
W1: barrier.load() : false , spin loop exits W1: barrier.load() : false ，自旋循环退出
C2: barrier.exchange(true) : returns false . C2： barrier.exchange(true) ：返回false 。 Break out of loop.跳出循环。 Now barrier == true .现在barrier == true 。
C2: assert(barrier) (passes) C2： assert(barrier) （通过）
C2: workers_at_barrier.load() : 2. The while loop exits immediately. C2: workers_at_barrier.load() : 2. while循环立即退出。
C2: assert(workers_at_barrier.load() == 2) (passes) C2： assert(workers_at_barrier.load() == 2) （通过）
C2 returns from barrier_lock() C2 从barrier_lock()返回
W1: --workers_at_barrier : 1 W1： --workers_at_barrier ：1
C2 in handle_stuff() : Now barrier == true and workers_at_barrier == 1 . handle_stuff()中的 C2 ：现在barrier == true和workers_at_barrier == 1 。 The assertion fails.断言失败。

I'm not sure of the best fix offhand.我不确定最好的解决方法。 Perhaps barrier should have a third "draining" state, in which the control thread still owns the barrier but the workers can leave it.也许barrier应该有第三种“耗尽”状态，在这种状态下，控制线程仍然拥有屏障，但工作人员可以离开它。 Only after they have done so does the control thread fully release the barrier.只有在他们这样做之后，控制线程才会完全释放屏障。

C++ 在“线程屏障”同步模式上正确的原子内存排序

问题描述

1 个解决方案

解决方案1
2 2022-07-15 08:16:35

C++ 在“线程屏障”同步模式上正确的原子内存排序

问题描述

1 个解决方案

解决方案1 2 2022-07-15 08:16:35

解决方案1
2 2022-07-15 08:16:35