简体   繁体   English

带有围栏和获取/释放的C ++ memory_order

[英]C++ memory_order with fences and aquire/release

I have the following C++ 2011 code: 我有以下C ++ 2011代码:

std::atomic<bool> x, y;
std::atomic<int> z;

void f() {
   x.store(true, std::memory_order_relaxed);
   std::atomic_thread_fence(std::memory_order_release);
   y.store(true, std::memory_order_relaxed);
}

void g() {
   while (!y.load(std::memory_order_relaxed)) {}
   std::atomic_thread_fence(std::memory_order_acquire);
   if (x.load(std::memory_order_relaxed)) ++z;
}

int main() {
   x = false;
   y = false;
   z = 0;
   std::thread t1(f);
   std::thread t2(g);
   t1.join();
   t2.join();
   assert(z.load() !=0);
   return 0;
}

At my computer architecture class, we've been told that the assert in this code always comes true. 在我的计算机体系结构类中,我们被告知此代码中的断言始终成立。 But after reviewing it thouroughly now, I can't really understand why it's so. 但经过现在的审查,我无法理解为什么会这样。

For what I know: 据我所知:

  • A fence with ' memory_order_release ' will not allow previous stores to be executed after it 带有' memory_order_release '的栅栏不允许在它之后执行先前的存储
  • A fence with ' memory_order_acquire ' will not allow that any load that comes after it to be executed before it. 带有' memory_order_acquire '的栅栏不允许在它之后执行任何加载。

If my understanding is correct, why can't the following sequence of actions occur? 如果我的理解是正确的,为什么不能发生以下一系列动作?

  1. Inside t1, y.store(true, std::memory_order_relaxed); 在t1里面, y.store(true, std::memory_order_relaxed); is called 叫做
  2. t2 runs entirely, and will see a 'false' when loading 'x', therefore not increasing z in a unit t2完全运行,并且在加载'x'时会看到'false',因此不会在单位中增加z
  3. t1 finishes execution t1完成执行
  4. In the main thread, the assert fails because z.load() returns 0 在主线程中,断言失败,因为z.load()返回0

I think this complies with 'acquire'-'release' rules, but, for example in the best answer in this question: Understanding c++11 memory fences which is very similar to my case, it hints that something like step 1 in my sequence of actions cannot happen before the 'memory_order_release', but doesn't get into details for the reason behind it. 我认为这符合'获取' - '发布'规则,但是,例如在这个问题的最佳答案: 理解c ++ 11内存栅栏 ,这与我的情况非常相似,它暗示我的步骤1动作序列不能在'memory_order_release'之前发生,但是由于其背后的原因没有详细说明。

I'm terribly puzzled about this, and will be very glad if anyone could shed some light on it :) 我对此非常不解,如果有人能说清楚它会很高兴:)

Exactly what happens in each of these cases depends on what processor you are actually using. 确切地说,在每种情况下发生的情况取决于您实际使用的处理器。 For example, x86 would probably not assert on this, since it is a cache-coherent architecture (you can have race-conditions, but once a value is written out to cache/memory from the processor, all other processors will read that value - of course, doesn't stop another processor from writing a different value immediately after, etc). 例如,x86可能不会断言,因为它是一个缓存一致的架构(你可以有竞争条件,但是一旦从处理器向缓存/内存写入一个值,所有其他处理器将读取该值 - 当然,不会阻止另一个处理器立即写入不同的值,等等)。

So assuming this is running on an ARM or similar processor that isn't guaranteed to be cache-coherent by itself: 因此,假设这是在ARM或类似的处理器上运行,不能保证它本身是缓存一致的:

Because the write to x is done before the memory_order_release , the t2 loop will not exit the while(y...) until x is also true. 因为写入x是在memory_order_release之前完成的,所以t2循环不会退出while(y...)直到x也为真。 This means that when x is being read later on, it is guaranteed to be one, so z is updated. 这意味着当稍后读取x时,它保证为1,因此更新z My only slight query is as to if you don't need a release for z as well... If main is running on a different processor than t1 and t2 , then z may stil have a stale value in main . 我唯一的一个小问题是,如果你不需要zrelease ...如果main运行在与t1t2不同的处理器上,那么z可能在main有一个陈旧的值。

Of course, that's not GUARANTEED to happen if you have a multitasking OS (or just interrupts that do enough stuff, etc) - since if the processor that ran t1 gets its cache flushed, then t2 may well read the new value of x. 当然,如果你有一个多任务操作系统(或只是执行足够的中断等等),那就不是保证发生 - 因为如果运行t1的处理器刷新其缓存,那么t2可能会读取x的新值。

And like I said, this won't have that effect on x86 processors (AMD or Intel ones). 就像我说的那样,这对x86处理器(AMD或Intel处理器)没有影响。

So, to explain barrier instructions in general (also applicable to Intel and AMD process0rs): 因此,一般性地解释屏障指令(也适用于Intel和AMD process0rs):

First, we need to understand that although instructions can start and finish out of order, the processor does have a general "understanding" of order. 首先,我们需要理解,尽管指令可以按顺序开始和结束,但处理器确实对订单有一般性的“理解”。 Let's say we have this "pseudo-machine-code": 假设我们有这个“伪机器代码”:

 ...
 mov $5, x
 cmp a, b
 jnz L1
 mov $4, x

L1: ... L1:......

THe processor could speculatively execute mov $4, x before it completes the "jnz L1" - so, to solve this fact, the processor would have to roll-back the mov $4, x in the case where the jnz L1 was taken. 处理器可以在完成“jnz L1”之前推测性地执行mov $4, x - 因此,为了解决这个问题,处理器必须在获取jnz L1的情况下回滚mov $4, x

Likewise, if we have: 同样,如果我们有:

 mov $1, x
 wmb         // "write memory barrier"
 mov $1, y

the processor has rules to say "do not execute any store instruction issued AFTER wmb until all stores before it has been completed". 处理器有规则说“在完成任何商店之前,不要执行任何商店指令”。 It is a "special" instruction - it's there for the precise purpose of guaranteeing memory ordering. 这是一个“特殊”指令 - 它的目的是保证内存排序。 If it's not doing that, you have a broken processor, and someone in the design department has "his ass on the line". 如果它没有这样做,那么你的处理器就会坏掉,而设计部门的某个人就有“他的屁股”。

Equally, the "read memory barrier" is an instruction which guarantees, by the designers of the processor, that the processor will not complete another read until we have completed the pending reads before the barrier instruction. 同样地,“读取存储器屏障”是由处理器的设计者保证在我们在屏障指令之前完成未决读取之前处理器将不完成另一次读取的指令。

As long as we're not working on "experimental" processors or some skanky chip that doesn't work correctly, it WILL work that way. 只要我们不处理“实验性”处理器或一些不能正常工作的臭名昭着的芯片,它就会以这种方式工作。 It's part of the definition of that instruction. 这是该指令定义的一部分。 Without such guarantees, it would be impossible (or at least extremely complicated and "expensive") to implement (safe) spinlocks, semaphores, mutexes, etc. 没有这样的保证,实现(安全)自旋锁,信号量,互斥体等是不可能的(或者至少非常复杂和“昂贵”)。

There are often also "implicit memory barriers" - that is, instructions that cause memory barriers even if they are not. 通常还存在“隐含的记忆障碍” - 也就是说,即使它们不存在也会导致记忆障碍的指令。 Software interrupts ("INT X" instruction or similar) tend to do this. 软件中断(“INT X”指令或类似指令)倾向于这样做。

I don't like arguing about C++ concurrency questions in terms of "this processor does this, that processor does that". 我不喜欢用“这个处理器做到这一点,处理器那样做”来争论C ++并发性问题。 C++11 has a memory model, and we should be using this memory model to determine what is valid and what isn't. C ++ 11有一个内存模型,我们应该使用这个内存模型来确定什么是有效的,什么不是。 CPU architectures and memory models are usually even harder to understand. CPU架构和内存模型通常更难理解。 Plus there's more than one of them. 另外还有不止一个。

With this in mind, consider this: thread t2 is blocked in the while loop until t1 executes the y.store and the change has propagated to t2. 考虑到这一点,请考虑这一点:线程t2在while循环中被阻塞,直到t1执行y.store并且更改已传播到t2。 (Which, by the way, could in theory be never. But that's not realistic.) Therefore we have a happens-before relationship between the y.store in t1 and the y.load in t2 that allows it to leave the loop. (顺便说一句,从理论上讲,它永远不会。但这并不现实。)因此,我们在t1中的y.store和t2中的y.load之间存在一个先发生的关系,它允许它离开循环。

Furthermore, we have simple intra-thread happens-before relations between the x.store and the release barrier and the barrier and the y.store. 此外,我们在x.store和release屏障之间以及barrier和y.store之间存在简单的线程内发生关系。

In t2, we have a happens-before between the true-returning load and the acquire barrier and the x.load. 在t2中,我们在真正返回的负载和获取障碍与x.load之间发生了一个事件。

Because happens-before is transitive, the release barrier happens-before the acquire barrier, and the x.store happens-before the x.load. 因为before-before是传递性的,所以释放障碍发生在获取障碍之前,而x.store发生在x.load之前。 Because of the barriers, the x.store synchronizes-with the x.load, which means the load has to see the value stored. 由于存在障碍,x.store与x.load同步 - 这意味着加载必须查看存储的值。

Finally, the z.add_and_fetch (post-increment) happens-before the thread termination, which happens-before the main thread wakes from t2.join, which happens-before the z.load in the main thread, so the modification to z must be visible in the main thread. 最后,z.add_and_fetch(后增量)发生在线程终止之前,这发生在主线程从t2.join唤醒之前 - 这发生在主线程中的z.load之前,所以修改为z必须在主线程中可见。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM