简体   繁体   English

C ++在两个不同的变量上使用memory_order_relaxed

[英]C++ using memory_order_relaxed on two different variables

What is the most correct way to relax the synchronization of the variables valA and valB on the loading in ThreadMethodOne (Assuming there is no false cacheline sharing of valA and valB)? 在ThreadMethodOne中的加载上放松变量valA和valB的同步的最正确方法是什么(假设不存在valA和valB的错误的缓存行共享)? It would seem that I should not change ThreadMethodOne to use memory_order_relaxed for the loading of valA since the compiler could move the valA.load after valB.load since the memory_order_acquire on valB.load doesn't protect valA from moving after the valB.load once that change is made. 似乎我不应该更改ThreadMethodOne来使用memory_order_relaxed加载valA,因为编译器可以在valB.load之后移动valA.load,因为valB.load上的memory_order_acquire不能保护valA在valB.load之后不能移动进行更改。 It also seems that I can't use memory_order_relaxed on the valB.load since it would no longer synchronize with the fetch_add in ThreadMethodTwo. 似乎我也不能在valB.load上使用memory_order_relaxed,因为它将不再与ThreadMethodTwo中的fetch_add同步。 Would it be better to swap the items and relax the load of valA? 交换项目并放松valA的负担会更好吗?

Is this the correct change? 这是正确的更改吗?

nTotal += valB.load(std::memory_order_acquire);
nTotal += valA.load(std::memory_order_relaxed);

Looking at the results on Compiler Explorer seems to show the same code generation for ThreadMethodOne when using memory_order_relaxed for either valA or valB even when I don't swap the order of the instructions. 在valA或valB中使用memory_order_relaxed时,即使我不交换指令的顺序,在Compiler Explorer上查看结果似乎也显示了ThreadMethodOne的相同代码生成。 I also see that the memory_order_relaxed in the ThreadMethodTwo still compiles to be the same as memory_order_release. 我还看到ThreadMethodTwo中的memory_order_relaxed仍然编译为与memory_order_release相同。 Changing the memory_order_relaxed to the following line seems to make it a non-lock add 'valA.store(valA.load(std::memory_order_relaxed) + 1, std::memory_order_relaxed);' 将memory_order_relaxed更改为以下行似乎使它成为非锁定状态:添加'valA.store(valA.load(std :: memory_order_relaxed)+ 1,std :: memory_order_relaxed);' But I don't know if this is better. 但是我不知道这是否更好。

Full program: 完整程序:

#include <stdio.h>
#include <stdlib.h>
#include <thread>
#include <atomic>
#include <unistd.h>

bool bDone { false };
std::atomic_int valA {0};
std::atomic_int valB {0};

void ThreadMethodOne()
{
    while (!bDone)
    {
        int nTotal {0};
        nTotal += valA.load(std::memory_order_acquire);
        nTotal += valB.load(std::memory_order_acquire);
        printf("Thread total %d\n", nTotal);
    }
}

void ThreadMethodTwo()
{
    while (!bDone)
    {
        valA.fetch_add(1, std::memory_order_relaxed);
        valB.fetch_add(1, std::memory_order_release);
    }
}

int main()
{
    std::thread tOne(ThreadMethodOne);
    std::thread tTwo(ThreadMethodTwo);

    usleep(100000);
    bDone = true;

    tOne.join();
    tTwo.join();

    int nTotal = valA.load(std::memory_order_acquire);
    nTotal += valB.load(std::memory_order_acquire);
    printf("Completed total %d\n", nTotal);
}

A better sample leaving the original one since it was the one written about in the comments 一个更好的样本离开了原始样本,因为它是评论中所写的样本

#include <stdio.h>
#include <stdlib.h>
#include <thread>
#include <atomic>
#include <unistd.h>

std::atomic_bool bDone { false };
std::atomic_int valA {0};
std::atomic_int valB {0};

void ThreadMethodOne()
{
    while (!bDone)
    {
        int nTotalA = valA.load(std::memory_order_acquire);
        int nTotalB = valB.load(std::memory_order_relaxed);
        printf("Thread total A: %d B: %d\n", nTotalA, nTotalB);
    }
}

void ThreadMethodTwo()
{
    while (!bDone)
    {
        valB.fetch_add(1, std::memory_order_relaxed);
        valA.fetch_add(1, std::memory_order_release);
    }
}

int main()
{
    std::thread tOne(ThreadMethodOne);
    std::thread tTwo(ThreadMethodTwo);

    usleep(100000);
    bDone = true;

    tOne.join();
    tTwo.join();

    int nTotalA = valA.load(std::memory_order_acquire);
    int nTotalB = valB.load(std::memory_order_relaxed);
    printf("Completed total A: %d B: %d\n", nTotalA, nTotalB);
}

After cleaning up your code, see my comment, we get something like, 清理完代码后,请参阅我的评论,我们得到如下信息:

#include <atomic>
#include <iostream>

std::atomic_int valA {0};
std::atomic_int valB {0};

void ThreadMethodOne()
{
    int nTotalA = valA.load(std::memory_order_acquire);
    int nTotalB = valB.load(std::memory_order_relaxed);
    std::cout << "Thread total A: " << nTotalA << " B: " << nTotalB << '\n';
}

void ThreadMethodTwo()
{
    valB.fetch_add(1, std::memory_order_relaxed);
    valA.fetch_add(1, std::memory_order_release);
}

int main()
{
    std::thread tOne(ThreadMethodOne);
    std::thread tTwo(ThreadMethodTwo);

    tOne.join();
    tTwo.join();

    int nTotalA = valA.load(std::memory_order_acquire);
    int nTotalB = valB.load(std::memory_order_relaxed);
    std::cout << "Completed total A: " << nTotalA << " B: " << nTotalB << '\n';
}

The possible outcomes of this program are: 该计划的可能结果是:

Thread total A: 0 B: 0
Completed total A: 1 B: 1

or 要么

Thread total A: 0 B: 1
Completed total A: 1 B: 1

or 要么

Thread total A: 1 B: 1
Completed total A: 1 B: 1

The reason that it always prints Completed total A: 1 B: 1 is that thread 2 was joined and thus finished, which added 1 to each variable, and the loads in thread 1 have no influence on that. 之所以总是打印Completed total A: 1 B: 1是因为线程2被加入并完成了,线程2的每个变量加1,线程1的负载对此没有影响。

If thread 1 runs and completes in its entirety before thread 2 then it will obviously print 0 0, while if thread 2 runs and completes in its entirety before thread 1 then thread 1 will print 1 1. Note how doing a memory_order_acquire load in thread 1 doesn't enforce anything. 如果线程1在线程2之前运行并完整完成,那么它将明显打印0 0,而如果线程2在线程1之前运行并完整完成,则线程1将打印11。请注意如何在线程1中执行memory_order_acquire加载不执行任何操作。 It can easily read the initial value of 0. 它可以轻松读取初始值0。

If the threads run more or less at the same time then the outcome of 0 1 is also quite trivial: thread 1 might execute its first line, then thread 2 executes both of its lines and finally thread 1 reads the value written by thread 2 to valB (it doesn't have to because it is relaxed, but int that case we just get the 0 0 output; at the very least it is possible however that it will read 1, if we wait long enough). 如果线程在同一时间或多或少地运行,那么0 1的结果也很微不足道:线程1可能执行其第一行,然后线程2执行其两行,最后线程1读取线程2写入的值以valB(不必因为它是宽松的,但在这种情况下,我们只得到0 0输出;但是,如果我们等待足够长的时间,至少它有可能会读为1)。

So, the only question of interest is: why don't we see an output of 1 0? 因此,唯一感兴趣的问题是:为什么我们看不到1 0的输出?

The reason is that if thread 1 reads a value 1 for valA then that has to be the value written by thread 2. Here the write whose value is read is a write release, while the read itself is a read acquire. 原因是,如果线程1读取valA的值1,那么该值必须是线程2写入的值。这里,读取值的写入是写释放,而读取本身是读取获取。 This causes a synchronization to happen, causing every side effect of thread 2 that happened before the write release to be visible to every memory access in thread 1 after the read release. 这导致发生同步,从而导致线程2在写释放之前发生的每个副作用对于线程1在读释放之后在线程1中的每个内存访问可见。 In other words, if we read valA==1 then the subsequent read of valB (relaxed or not) will see the write to valB of thread 2 and thus always see a 1 and never a 0. 换句话说,如果我们读取valA == 1,则随后对valB的读取(是否放松)将看到对线程2的valB的写入,因此始终看到1而不是0。

Unfortunately I cannot say more about this because your question is very unclear: I don't know what you expected the outcome to be, or want to be; 不幸的是,由于您的问题还不清楚,我无法再多说些什么:我不知道您期望的结果是或希望达到什么。 so I can say nothing about memory requirements for that to happen. 因此,对于发生这种情况的内存需求,我什么也不能说。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM