简体繁体 English

具有 memory_order_relaxed 的存储是否可能永远不会到达其他线程？

[英]Is it possible that a store with memory_order_relaxed never reaches other threads?

原文 2017-05-03 02:15:40 3 3 c++/ c++11/ memory-barriers/ relaxed-atomics

Suppose I have a thread A that writes to an atomic_int x = 0;假设我有一个线程A写入一个atomic_int x = 0; , using x.store(1, std::memory_order_relaxed); , 使用x.store(1, std::memory_order_relaxed); . . Without any other synchronization methods, how long would it take before other threads can see this, using x.load(std::memory_order_relaxed);如果没有任何其他同步方法，使用x.load(std::memory_order_relaxed);其他线程需要多长时间才能看到这一点x.load(std::memory_order_relaxed); ? ? Is it possible that the value written to x stays entirely thread-local given the current definition of the C/C++ memory model that the standard gives?鉴于标准给出的 C/C++ 内存模型的当前定义，写入x的值是否可能完全保持线程本地？

The practical case that I have at hand is where a thread B reads an atomic_bool frequently to check if it has to quit;我手头的实际情况是线程B经常读取atomic_bool以检查它是否必须退出； Another thread, at some point, writes true to this bool and then calls join() on thread B. Clearly I do not mind to call join() before thread B can even see that the atomic_bool was set, nor do I mind when thread B already saw the change and exited execution before I call join().另一个线程在某个时候向这个 bool 写入true ，然后在线程 B 上调用 join()。显然我不介意在线程 B 甚至可以看到 atomic_bool 被设置之前调用 join()，我也不介意线程何时在我调用 join() 之前，B 已经看到了变化并退出了执行。 But I am wondering: using memory_order_relaxed on both sides, is it possible to call join() and block "forever" because the change is never propagated to thread B?但我想知道：在双方都使用memory_order_relaxed ，是否可以调用 join() 并“永远”阻止，因为更改永远不会传播到线程 B？

Edit编辑

I contacted Mark Batty (the brain behind mathematically verifying and subsequently fixing the C++ memory model requirements).我联系了 Mark Batty（在数学上验证并随后修复 C++ 内存模型要求背后的大脑）。 Originally about something else (which turned out to be a known bug in cppmem and his thesis; so fortunately I didn't make a complete fool of myself, and took the opportunity to ask him about this too; his answer was:本来是关于别的（后来证明是 cppmem 和他的论文中的一个已知错误；所以幸运的是我没有完全自欺欺人，也借此机会问他这个问题；他的回答是：

Q: Can it theoretically be that such a store [memory_order_relaxed without (any following) release operation] never reaches the other thread? Q：理论上可以这样的存储[memory_order_relaxed without (any following) release operation]永远不会到达其他线程吗？
Mark: Theoretically, yes, but I don't think that has been observed.马克：理论上，是的，但我认为没有观察到。
Q: In other words, do relaxed stores make no sense whatsoever unless you combine them with some release operation (and acquire on the other thread), assuming you want another thread to see it?问：换句话说，除非你将它们与一些释放操作（并在另一个线程上获取）结合起来，假设你想让另一个线程看到它，否则宽松存储没有任何意义吗？
Mark: Nearly all of the use cases for them do use release and acquire, yes.马克：他们的几乎所有用例都使用发布和获取，是的。

3 个解决方案

This is all the standard has to say on the matter, I believe:这就是标准对此事的全部看法，我相信：

[intro.multithread]/25 An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time. [intro.multithread]/25实现应确保原子或同步操作分配的最后一个值（按修改顺序）在有限的时间内对所有其他线程可见。

In practice在实践中

Without any other synchronization methods, how long would it take before other threads can see this, using x.load(std::memory_order_relaxed);如果没有任何其他同步方法，使用x.load(std::memory_order_relaxed);其他线程需要多长时间才能看到这一点x.load(std::memory_order_relaxed); ? ?

No time.没时间。 It's a normal write, it goes to the store buffer, so it will be available in the L1d cache in less time than a blink.这是一个正常的写入，它进入存储缓冲区，因此它将在不到眨眼的时间内在 L1d 缓存中可用。 But that's only when the assembly instruction is run.但这只是在运行汇编指令时。

Instructions can be reordered by the compiler , but no reasonable compiler would reorder atomic operation over arbitrarily long loops.指令可以由编译器重新排序，但没有一个合理的编译器会在任意长的循环上重新排序原子操作。

In theory理论上

Q: Can it theoretically be that such a store [ memory_order_relaxed without (any following) release operation] never reaches the other thread? Q：理论上这样的存储[ memory_order_relaxed without (any following) release operation]永远不会到达其他线程吗？

Mark: Theoretically, yes,马克：理论上，是的，

You should have asked him what would happen if the "following release fence" was added back.你应该问他如果重新添加“后续发布栏”会发生什么。 Or with atomic store release operation.或者用原子存储释放操作。

Why wouldn't these be reordered and delayed a loooong time?为什么不将这些重新排序并延迟很长时间？ (so long that it seems like an eternity in practice) （久到在实践中似乎是永恒的）

Is it possible that the value written to x stays entirely thread-local given the current definition of the C/C++ memory model that the standard gives?鉴于标准给出的 C/C++ 内存模型的当前定义，写入 x 的值是否可能完全保持线程本地？

If an imaginary and especially perverse implementation wanted to delay the visibility of atomic operation, why would it do that only for relaxed operations?如果一个虚构的、特别反常的实现想要延迟原子操作的可见性，为什么它只对宽松的操作这样做呢？ It could well do it for all atomic operations.它可以很好地完成所有原子操作。

Or never run some threads.或者永远不要运行某些线程。

Or run some threads so slowly that you would believe they aren't running.或者运行某些线程的速度太慢，以至于您会认为它们没有运行。

This is what the standard says in 29.3.12:这是标准在 29.3.12 中所说的：

Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.实现应该在合理的时间内使原子存储对原子负载可见。

There is no guarantee a store will become visible in another thread, there is no guaranteed timing and there is no formal relationship with memory order.不能保证一个store会在另一个线程中可见，没有保证的时间，也没有与内存顺序的正式关系。

Of course, on each regular architecture a store will become visible, but on rare platforms that do not support cache coherency, it may never become visible to a load .当然，在每个常规架构上， store将变得可见，但在不支持缓存一致性的罕见平台上，它可能永远不会对load可见。
In that case, you would have to reach for an atomic read-modify-write operation to get the latest value in the modification order.在这种情况下，您将不得不执行原子读-修改-写操作以获取修改顺序中的最新值。