简体   繁体   English

使用原子变量,CyclicBarrier中的C ++倒计时出错[请提供无锁的解决方案]

[英]C++ countdown in CyclicBarrier going wrong using atomic variables [solutions without locks please]

I am trying to implement a cyclic barrier in C++ from scratch. 我正在尝试从头开始在C ++中实现循环屏障。 Aim is to implement as conformant to Java implementation as possible. 目的是实现与Java实现尽可能一致的方法。 The class reference is here. 类参考在这里。 https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CyclicBarrier.html https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CyclicBarrier.html

Now in my testing the returnStatus should be for each thread which successfully trips the barrier , a value ranging from barrierLimit-1 to zero. 现在,在我的测试中,returnStatus应该针对成功触发屏障的每个线程,其值范围为barrierLimit-1到零。 I am trying to achieve this using atomic variables and memory fence. 我正在尝试使用原子变量和内存围栏来实现这一目标。 but my code is failing testing and in some cases two threads are same value of returnStatus. 但是我的代码无法通过测试,在某些情况下,两个线程的returnStatus值相同。

Would some one please suggest if any technique can be helpful to resolve this. 请问有人建议是否有任何技术可以帮助解决这一问题。 I want to solve this without using locks so that i can truly apply the lockless behaviour as much as possible. 我想解决此问题而不使用锁,以便我可以尽可能地真正应用无锁行为。

The full code reference is at : https://github.com/anandkulkarnisg/CyclicBarrier/blob/master/CyclicBarrier.cpp 完整的代码参考位于: https : //github.com/anandkulkarnisg/CyclicBarrier/blob/master/CyclicBarrier.cpp

Sample test case result is below [ buggy case ]:

I am currently in thread id = 140578053969664.My barrier state count is = 4
I am currently in thread id = 140577877722880.My barrier state count is = 2
I am currently in thread id = 140577550407424.My barrier state count is = 1
I am currently in thread id = 140577936471808.My barrier state count is = 2
I am currently in thread id = 140577760225024.My barrier state count is = 0


The code snippet is below.

        // First check and ensure that the barrier is in good / broken state.
        if(!m_barrierState && !m_tripStatus)
        {
            // First check the status of the variable and immediately exit throwing exception if the count is zero.
            int returnResult;
            if(m_count == 0)
                throw std::string("The barrier has already tripped. Pleas reset the barrier before use again!!" + std::to_string(returnResult));

            // First ensure that the current wait gets the waiting result assigned immediately.

            std::atomic_thread_fence(std::memory_order_acquire);
            m_count.fetch_sub(1, std::memory_order_seq_cst);
            returnResult = m_count.load();
    std::atomic_thread_fence(std::memory_order_release);
std::atomic_thread_fence(std::memory_order_acquire);
m_count.fetch_sub(1, std::memory_order_seq_cst);      // [1]
returnResult = m_count.load();                        // [2]
std::atomic_thread_fence(std::memory_order_release);

[2] multiple threads are doing this step at the same time. [2]多个线程同时执行此步骤。 std::atomic_thread_fence does not prevent other threads from running the same code at the same time. std::atomic_thread_fence不会阻止其他线程同时运行相同的代码。 That's how 2 threads can end up with the same value. 这就是2个线程最终得到相同值的方式。

Instead, catch the return value of the fetch_sub on the line marked with [1] 而是在标记为[1]的行上捕获fetch_sub的返回值。

returnResult = m_count.fetch_sub(1, std::memory_order_seq_cst) - 1;

btw, I'm pretty sure you don't need the fences here. 顺便说一句,我敢肯定你在这里不需要栅栏。 (I can't really tell without seeing more of the function.) If you do, you might just switch returnResult to be an atomic instead. (我真的不能不看更多功能就知道。)如果这样做,您可能只是将returnResult切换为原子。

It looks like you're using fences as if they were transactional memory. 似乎您正在使用隔离栅,就好像它们是事务性内存一样。 They are not. 他们不是。 The release essentially controls guarantees of ordering of stores when perceived by any CPU that uses an acquire. 该发行版实质上控制了由使用获取的任何CPU感知到的存储顺序保证。 As long it doesn't break the ordering guarantees, the write is free to propagate before the release is actually processed. 只要不违反顺序保证,就可以在实际处理发行版之前自由传播写入。 As a thought experiment, imagine that [1] is executed, then a context switch occurs, a million years passes, then [2] is executed. 作为一个思想实验,设想执行[1] ,然后发生上下文切换,经过一百万年,然后执行[2] It's now clearly absurd to assume that m_count holds the same value that it did a million years ago. 现在假设m_count与一百万年前具有相同的值显然是荒谬的。 The release may flush the write buffer, but it's possible that the change was flushed already. 该发行版可能会刷新写缓冲区,但是可能已经清空了更改。

Lastly, weird stuff can happen if you mix seq_cst with acquire / release semantics. 最后,如果将seq_cstacquire / release语义混合seq_cst ,可能会发生奇怪的事情。 Sorry that's vague, but I don't understand it well enough to try to explain it. 抱歉,这很含糊,但我对它的理解不够充分,无法解释。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM