与 memory_order_acquire 和 memory_order_release 的原子交换

Question

I have a situation that I would like to prepare some data in one thread:我有一种情况，我想在一个线程中准备一些数据：

// My boolean flag
std::atomic<bool> is_data_ready = false;

Thread 1 (producer thread):
  PrepareData();
  if (!is_data_ready.exchange(true, std::memory_order_release)) {
    NotifyConsumerThread();
  }
  else {
    return;
  }

In consumer thread,在消费者线程中，

Thread 2:
  if (is_data_ready.exchange(false, std::memory_order_acquire)) {
    ProcessData();
  }

Does it make sense to use acquire/release order (instead of acq_rel order) for exchange?使用获取/释放顺序（而不是 acq_rel 顺序）进行交换是否有意义？ I am not sure if I understand it correctly: does std::memory_order_release in exchange mean the store is a release store?我不确定我是否理解正确：作为交换， std::memory_order_release是否意味着商店是发布商店？ If so, what is the memory order for the load?如果是这样，负载的 memory 顺序是什么？

Answer 1

An atomic RMW has a load part and a store part.原子 RMW 具有加载部分和存储部分。 memory_order_release gives the store side release semantics, while leaving the load side relaxed . memory_order_release提供存储端释放语义，同时让负载端relaxed 。 The reverse for exchange(val, acquire) . exchange(val, acquire)的反面。 With exchange(val, acq_rel) or seq_cst , the load would be an acquire load, the store would be a release store.使用exchange(val, acq_rel)或seq_cst ，加载将是获取加载，存储将是释放存储。

( compare_exchange_weak / _strong can have one memory order for the pure-load case where the compare failed, and a separate memory order for the RMW case where it succeeds. This distinction is meaningful on some ISAs, but not on ones like x86 where it's just a single instruction that effectively always stores, even in the false case.) ( compare_exchange_weak / _strong can have one memory order for the pure-load case where the compare failed, and a separate memory order for the RMW case where it succeeds. This distinction is meaningful on some ISAs, but not on ones like x86 where it's just一条有效地始终存储的指令，即使在错误的情况下也是如此。）

And of course atomicity of the exchange (or any other RMW) is guaranteed regardless of anything else;当然，无论其他任何事情，都可以保证交换（或任何其他 RMW）的原子性； no stores or RMWs to this object by other cores can come between the load and store parts of the exchange.其他内核对此 object 的存储或 RMW 不能在交换的加载和存储部分之间进行。 Notice that I didn't mention pure loads, or operations on other objects.请注意，我没有提到纯负载或对其他对象的操作。 See later in this answer and also For purposes of ordering, is atomic read-modify-write one operation or two?请参阅此答案的后面部分以及出于排序目的，原子读取-修改-写入操作是一个还是两个？

Yes, this looks sensible, although simplistic and maybe racy in allowing more stuff to be published after the first batch is consumed (or started to consume) ¹ .是的，这看起来很明智，尽管在第一批消费（或开始消费） ¹之后允许发布更多内容很简单，而且可能很不礼貌。 But for the purposes of understanding how atomic RMWs work, and the ordering of its load and store sides, we can ignore that.但是为了理解原子 RMW 是如何工作的，以及它的加载和存储端的顺序，我们可以忽略它。

exchange(true, release) "publishes" some shared data stored by PrepareData() , and checks the old value to see if the worker thread needs to get notified. exchange(true, release) “发布” PrepareData()存储的一些共享数据，并检查旧值以查看工作线程是否需要得到通知。

And in the reader, is_data_ready.exchange(false, acquire) is a load that syncs with the release-store if there was one, creating a happens-before relationship that makes it safe to read that data without data-race UB.在阅读器中， is_data_ready.exchange(false, acquire)是一个与发布存储同步的加载（如果有），创建一个发生在之前的关系，使得在没有数据竞争 UB 的情况下可以安全地读取该数据。 And tied to that (as part of the atomic RMW), lets other threads see that it has gone past the point of checking for new work, so it needs another notify if there is any.并与此相关（作为原子 RMW 的一部分），让其他线程看到它已经超过了检查新工作的点，所以如果有的话，它需要另一个通知。

Yes, exchange(value, release) means the store part of the RMW has release ordering wrt.是的， exchange(value, release)意味着 RMW 的存储部分具有release顺序。 other operations in the same thread.同一线程中的其他操作。 The load part is relaxed , but the load/store pair still form an atomic RMW.加载部分是relaxed的，但加载/存储对仍然形成一个原子 RMW。 So the load can't take a value until this core has exclusive ownership of the cache line.所以在这个核心拥有缓存线的独占所有权之前，负载不能取值。

Or in C++ terms, it sees the "latest value" in the modification order of is_data_ready ;或者在 C++ 术语中，它在is_data_ready的修改顺序中看到“最新值”； if some other thread was also storing to is_data_ready , that store will happen either before the load (before the whole exchange), or after the store (after the whole exchange).如果其他线程也存储到is_data_ready ，则该存储将在加载之前（整个交换之前）或存储之后（整个交换之后）发生。

Note that a pure load in another core coming after the load part of this exchange is indistinguishable from coming before, so only operations that involve a store are part of the modification order of an object.请注意，在此交换的加载部分之后的另一个内核中的纯加载与之前的没有区别，因此只有涉及存储的操作才是 object修改顺序的一部分。 (That modification order is guaranteed to exist such that all threads can agree on it, even when you're using relaxed loads/stores.) （保证存在该修改顺序，以便所有线程都可以就它达成一致，即使您使用的是relaxed的加载/存储。）

But the load part of another atomic RMW will have to come before the load part of the exchange, otherwise that other RMW would have this exchange happening between its load and its store.但是另一个原子 RMW 的加载部分必须先于交换的加载部分，否则其他 RMW 将在其加载和存储之间发生这种交换。 That would violate the atomicity guarantee of the other RMW, so that can't happen.这将违反另一个 RMW 的原子性保证，因此这不会发生。 Atomic RMWs on the same object effectively serialize across threads.同一 object 上的原子 RMW 有效地跨线程序列化。 That's why a million fetch_add(1, mo_relaxed) operations on an atomic counter will increment it by 1 million, regardless of what order they end up running in. (See also C++: std::memory_order in std::atomic_flag::test_and_set to do some work only once by a set of threads re: why atomic RMWs have to work this way.)这就是为什么原子计数器上的一百万fetch_add(1, mo_relaxed)操作将增加一百万次，而不管它们最终以什么顺序运行。（另请参见C++: std::memory_order in std::atomic_flag::test_and_set to一组线程只做一次工作：为什么原子 RMW 必须以这种方式工作。）

C++ is specified in terms of syncs-with and whether a happens-before guarantee exists that allows your other loads to see other stores by other threads. C++ 是根据同步以及是否存在允许其他负载通过其他线程查看其他存储的发生之前保证来指定的。 But humans often like to think in terms of local reordering (within execution of one thread) of operations that access shared memory (via coherent cache).但是人们通常喜欢考虑访问共享 memory（通过一致缓存）的操作的本地重新排序（在一个线程的执行内）。

In terms of a memory-reordering model, the store part of an exchange(val, release) can reorder with later operations other than release or seq_cst .就内存重新排序 model 而言， exchange(val, release)的存储部分可以使用release或seq_cst以外的后续操作重新排序。 (Note that unlocking a mutex counts as a release operation). （请注意，解锁互斥锁算作释放操作）。 But not with any earlier operations.但不是任何早期的操作。 This is what acquire and release semantics are all about, as Jeff Preshing explains: https://preshing.com/20120913/acquire-and-release-semantics/ .这就是获取和释放语义的全部内容，正如 Jeff Preshing 解释的那样： https://preshing.com/20120913/acquire-and-release-semantics/ 。

Wherever the store ends up, the load is at some point before it.无论商店在哪里结束，负载都在它之前的某个时间点。 Right before it in the modification order of is_data_ready , but operations on other objects by this thread (especially in other cache lines) may be able to happen in between the load and store parts of an atomic exchange.就在is_data_ready的修改顺序之前，但是该线程对其他对象的操作（尤其是在其他缓存行中）可能会发生在原子交换的加载和存储部分之间。

In practice, some CPU architectures don't make that possible.在实践中，某些 CPU 架构无法做到这一点。 Notably x86 atomic RMW operations are always full barriers, which waits for all earlier loads and stores to complete before the exchange, and doesn't start any later loads and stores until after.值得注意的是 x86 原子 RMW 操作始终是完全屏障，它在交换之前等待所有早期的加载和存储完成，并且直到之后才开始任何以后的加载和存储。 So not even StoreLoad reordering of the store part of an exchange with later loads is possible on x86.因此，在 x86 上，甚至无法对交换的存储部分进行 StoreLoad 重新排序以及稍后的加载。

But on AArch64 you can observe StoreLoad reordering of the store part of a seq_cst exchange with a later relaxed load.但是在 AArch64 上，您可以观察到 StoreLoad 对seq_cst交换的存储部分的重新排序以及稍后relaxed的加载。 But only the store part, not the load part;但只有存储部分，没有加载部分； being seq_cst means the load part of the exchange has acquire semantics and thus happens before any later loads.是seq_cst意味着交换的加载部分已经获得语义，因此发生在任何以后的加载之前。 See For purposes of ordering, is atomic read-modify-write one operation or two?请参阅出于排序目的，原子读取-修改-写入操作是一个还是两个？

Footnote 1: is this a usable producer/consumer sync algorithm?脚注 1：这是一个可用的生产者/消费者同步算法吗？

With a single boolean flag (not a queue with a read-index / write-index), IDK how a producer would know when it can overwrite the shared variables that the consumer will look at.使用单个 boolean 标志（不是具有读取索引/写入索引的队列），IDK 生产者将如何知道何时可以覆盖消费者将查看的共享变量。 If it (or another producer thread) did that right away after seeing is_data_ready == false , you'd race with the reader that's just started reading.如果它（或另一个生产者线程）在看到is_data_ready == false后立即这样做，那么您将与刚刚开始阅读的读者竞争。

If you can solve that problem, this does appear to avoid the possibility of the consumer missing an update and going to sleep, as long as it handles the case where a second writer adds more data and sends a notify before the consumer finishes ProcessData .如果你能解决这个问题，这似乎可以避免消费者错过更新并进入睡眠状态的可能性，只要它处理第二个写入者添加更多数据并在消费者完成ProcessData之前发送通知的情况。 (The writers only know that the consumer has started, not when it finishes.) I guess this example isn't showing the notification mechanism, which might itself create synchronization. （作者只知道消费者已经开始，而不是何时结束。）我猜这个例子没有显示通知机制，它本身可能会创建同步。

If two producers run PrepareData() at overlapping times, the first one to finish will send a notification, not both.如果两个生产者在重叠时间运行PrepareData() ，第一个完成的将发送通知，而不是同时发送。 Unless the consumer does an exchange and resets is_data_ready between the two exchanges in the producers, then it will get a second notification.除非消费者在生产者的两个交换之间进行交换并重置is_data_ready ，否则它将收到第二个通知。 (So that sound pretty hard to deal with in the consumer, and in whatever data structure PrepareData() manages, unless it's something like a lock-free queue itself, in which case just check the queue for work instead of this mechanism. But again, this is still a usable example to talk about how exchange works.) （所以这听起来很难在消费者以及PrepareData()管理的任何数据结构中处理，除非它本身就像一个无锁队列，在这种情况下，只需检查队列的工作而不是这种机制。但又一次，这仍然是一个有用的例子来讨论exchange是如何工作的。）

If a consumer is frequently checking and finding no work needing doing, that's also extra contention that could have been avoided if it checks read-only until they see a true and exchange it to false (with an acquire exchange).如果消费者经常检查并发现没有工作需要做，那么如果它检查只读直到他们看到一个true并将其交换为false （通过acquire交换），这也是可以避免的额外争用。 But since you're worrying about notifications, I assume it's not a spin-wait loop, instead sleeping if there isn't work to do.但是由于您担心通知，因此我认为这不是自旋等待循环，而是在没有工作可做时休眠。

与 memory_order_acquire 和 memory_order_release 的原子交换

问题描述

1 个解决方案

解决方案1
0 2022-09-18 00:12:13

Footnote 1: is this a usable producer/consumer sync algorithm?脚注 1：这是一个可用的生产者/消费者同步算法吗？

与 memory_order_acquire 和 memory_order_release 的原子交换

问题描述

1 个解决方案

解决方案1 0 2022-09-18 00:12:13

Footnote 1: is this a usable producer/consumer sync algorithm?脚注 1：这是一个可用的生产者/消费者同步算法吗？

解决方案1
0 2022-09-18 00:12:13