简体   繁体   English

并发:C++11 内存模型中的原子性和易失性

[英]Concurrency: Atomic and volatile in C++11 memory model

A global variable is shared across 2 concurrently running threads on 2 different cores.全局变量在 2 个不同内核上的 2 个并发运行线程之间共享。 The threads writes to and read from the variables.线程写入和读取变量。 For the atomic variable can one thread read a stale value?对于原子变量,一个线程可以读取一个陈旧的值吗? Each core might have a value of the shared variable in its cache and when one threads writes to its copy in a cache the other thread on a different core might read stale value from its own cache.每个内核可能在其缓存中具有共享变量的值,并且当一个线程写入缓存中的副本时,不同内核上的另一个线程可能会从其自己的缓存中读取过时值。 Or the compiler does strong memory ordering to read the latest value from the other cache?或者编译器执行强内存排序以从其他缓存读取最新值? The c++11 standard library has std::atomic support. c++11 标准库有 std::atomic 支持。 How this is different from the volatile keyword?这与 volatile 关键字有何不同? How volatile and atomic types will behave differently in the above scenario?在上述情况下,volatile 和 atomic 类型的行为有何不同?

Firstly, volatile does not imply atomic access.首先, volatile并不意味着原子访问。 It is designed for things like memory mapped I/O and signal handling.它专为诸如内存映射 I/O 和信号处理之类的事情而设计。 volatile is completely unnecessary when used with std::atomic , and unless your platform documents otherwise, volatile has no bearing on atomic access or memory ordering between threads.std::atomic一起使用时, volatile完全没有必要,除非您的平台另有说明,否则volatile与线程之间的原子访问或内存排序无关。

If you have a global variable which is shared between threads, such as:如果您有一个在线程之间共享的全局变量,例如:

std::atomic<int> ai;

then the visibility and ordering constraints depend on the memory ordering parameter you use for operations, and the synchronization effects of locks, threads and accesses to other atomic variables.那么可见性和排序约束取决于您用于操作的内存排序参数,以及锁、线程和访问其他原子变量的同步效果。

In the absence of any additional synchronization, if one thread writes a value to ai then there is nothing that guarantees that another thread will see the value in any given time period.在没有任何额外同步的情况下,如果一个线程向ai写入一个值,则无法保证另一个线程在任何给定时间段内都能看到该值。 The standard specifies that it should be visible "in a reasonable period of time", but any given access may return a stale value.该标准规定它应该在“合理的时间段内”可见,但任何给定的访问都可能返回一个陈旧的值。

The default memory ordering of std::memory_order_seq_cst provides a single global total order for all std::memory_order_seq_cst operations across all variables. std::memory_order_seq_cst的默认内存排序为跨所有变量的所有std::memory_order_seq_cst操作提供单个全局总顺序。 This doesn't mean that you can't get stale values, but it does mean that the value you do get determines and is determined by where in this total order your operation lies.这并不意味着您无法获得过时的值,但这确实意味着您获得的值决定了您的操作在总顺序中的位置。

If you have 2 shared variables x and y , initially zero, and have one thread write 1 to x and another write 2 to y , then a third thread that reads both may see either (0,0), (1,0), (0,2) or (1,2) since there is no ordering constraint between the operations, and thus the operations may appear in any order in the global order.如果您有 2 个共享变量xy ,最初为零,并且有一个线程将 1 写入x ,另一个将 2 写入y ,则读取两者的第三个线程可能会看到 (0,0)、(1,0)、 (0,2) 或 (1,2) 因为操作之间没有排序约束,因此操作可以在全局顺序中以任何顺序出现。

If both writes are from the same thread, which does x=1 before y=2 and the reading thread reads y before x then (0,2) is no longer a valid option, since the read of y==2 implies that the earlier write to x is visible.如果两个写入都来自同一个线程,即在y=2之前执行x=1并且读取线程在x之前读取y ,则 (0,2) 不再是有效选项,因为对y==2的读取意味着较早写入x是可见的。 The other 3 pairings (0,0), (1,0) and (1,2) are still possible, depending how the 2 reads interleave with the 2 writes.其他 3 对 (0,0)、(1,0) 和 (1,2) 仍然是可能的,这取决于 2 个读取与 2 个写入的交错方式。

If you use other memory orderings such as std::memory_order_relaxed or std::memory_order_acquire then the constraints are relaxed even further, and the single global ordering no longer applies.如果您使用其他内存排序,例如std::memory_order_relaxedstd::memory_order_acquire则约束会进一步放宽,并且单个全局排序不再适用。 Threads don't even necessarily have to agree on the ordering of two stores to separate variables if there is no additional synchronization.如果没有额外的同步,线程甚至不必就两个存储的顺序达成一致以分离变量。

The only way to guarantee you have the "latest" value is to use a read-modify-write operation such as exchange() , compare_exchange_strong() or fetch_add() .保证您拥有“最新”值的唯一方法是使用读取-修改-写入操作,例如exchange()compare_exchange_strong()fetch_add() Read-modify-write operations have an additional constraint that they always operate on the "latest" value, so a sequence of ai.fetch_add(1) operations by a series of threads will return a sequence of values with no duplicates or gaps.读-修改-写操作有一个额外的约束,它们总是对“最新”值进行操作,因此一系列线程的ai.fetch_add(1)操作序列将返回一个没有重复或间隙的值序列。 In the absence of additional constraints, there's still no guarantee which threads will see which values though.在没有额外约束的情况下,仍然无法保证哪些线程会看到哪些值。 In particular, it is important to note that the use of an RMW operation does not force changes from other threads to become visible any quicker, it just means that if the changes are not seen by the RMW then all threads must agree that they are later in the modification order of that atomic variable than the RMW operation.特别要注意的是,使用 RMW 操作不会强制其他线程的更改更快地变得可见,这只是意味着如果 RMW没有看到更改,那么所有线程必须同意它们稍后在该原子变量的修改顺序中而不是 RMW 操作。 Stores from different threads can still be delayed by arbitrary amounts of time, depending on when the CPU actually issues the store to memory (rather than just its own store buffer), physically how far apart the CPUs executing the threads are (in the case of a multi-processor system), and the details of the cache coherency protocol.来自不同线程的存储仍然可以延迟任意时间量,这取决于 CPU实际将存储发布到内存(而不仅仅是它自己的存储缓冲区)的时间,执行线程的 CPU 在物理上相距多远(在这种情况下)多处理器系统),以及缓存一致性协议的细节。

Working with atomic operations is a complex topic.使用原子操作是一个复杂的话题。 I suggest you read a lot of background material, and examine published code before writing production code with atomics.我建议您阅读大量背景资料,并在使用原子编写生产代码之前检查已发布的代码。 In most cases it is easier to write code that uses locks, and not noticeably less efficient.在大多数情况下,编写使用锁的代码更容易,而且效率不会明显降低。

volatile and the atomic operations have a different background, and were introduced with a different intent. volatile和原子操作具有不同的背景,并且以不同的意图引入。

volatile dates from way back, and is principally designed to prevent compiler optimizations when accessing memory mapped IO. volatile可以追溯到很久以前,主要是为了在访问内存映射 IO 时防止编译器优化。 Modern compilers tend to do no more than suppress optimizations for volatile , although on some machines, this isn't sufficient for even memory mapped IO.现代编译器往往只会抑制对volatile优化,尽管在某些机器上,这对于内存映射 IO 来说还不够。 Except for the special case of signal handlers, and setjmp , longjmp and getjmp sequences (where the C standard, and in the case of signals, the Posix standard, gives additional guarantees), it must be considered useless on a modern machine, where without special additional instructions (fences or memory barriers), the hardware may reorder or even suppress certain accesses.除了信号处理程序的特殊情况以及setjmplongjmpgetjmp序列(其中 C 标准以及在信号的情况下 Posix 标准提供额外保证),必须认为它在现代机器上是无用的,其中没有特殊的附加指令(栅栏或内存屏障),硬件可能会重新排序甚至抑制某些访问。 Since you shouldn't be using setjmp et al.由于您不应该使用setjmp等。 in C++, this more or less leaves signal handlers, and in a multithreaded environment, at least under Unix, there are better solutions for those as well.在 C++ 中,这或多或少会留下信号处理程序,而在多线程环境中,至少在 Unix 下,也有更好的解决方案。 And possibly memory mapped IO, if you're working on kernal code and can ensure that the compiler generates whatever is needed for the platform in question.并且可能是内存映射 IO,如果您正在处理内核代码并且可以确保编译器生成相关平台所需的任何内容。 (According to the standard, volatile access is observable behavior, which the compiler must respect. But the compiler gets to define what is meant by “access”, and most seem to define it as “a load or store machine instruction was executed”. Which, on a modern processor, doesn't even mean that there is necessarily a read or write cycle on the bus, much less that it's in the order you expect.) (根据标准, volatile访问是可观察的行为,编译器必须尊重这一点。但是编译器可以定义“访问”的含义,并且大多数似乎将其定义为“执行了加载或存储机器指令”。在现代处理器上,这甚至并不意味着总线上一定有读或写周期,更不用说它是按照您期望的顺序进行的。)

Given this situation, the C++ standard added atomic access, which does provide a certain number of guarantees across threads;鉴于这种情况,C++ 标准增加了原子访问,这确实提供了一定数量的跨线程保证; in particular, the code generated around an atomic access will contain the necessary additional instructions to prevent the hardware from reordering the accesses, and to ensure that the accesses propagate down to the global memory shared between cores on a multicore machine.特别是,围绕原子访问生成的代码将包含必要的附加指令,以防止硬件重新排序访问,并确保访问向下传播到多核机器上内核之间共享的全局内存。 (At one point in the standardization effort, Microsoft proposed adding these semantics to volatile , and I think some of their C++ compilers do. After discussion of the issues in the committee, however, the general consensus—including the Microsoft representative—was that it was better to leave volatile with its orginal meaning, and to define the atomic types.) Or just use the system level primitives, like mutexes, which execute whatever instructions are needed in their code. (在标准化工作的某一时刻,Microsoft 提议将这些语义添加到volatile ,我认为他们的一些 C++ 编译器确实这样做了。然而,在委员会讨论这些问题之后,包括 Microsoft 代表在内的普遍共识是最好让volatile保留其原始含义,并定义原子类型。)或者只使用系统级原语,如互斥锁,它们执行代码中所需的任何指令。 (They have to. You can't implement a mutex without some guarantees concerning the order of memory accesses.) (他们必须这样做。如果没有关于内存访问顺序的一些保证,你就不能实现互斥锁。)

Here's a basic synopsis of what the 2 things are:以下是两件事的基本概要:

1) Volatile keyword: 1) 挥发性关键字:
Tells the compiler that this value could alter at any moment and therefore it should not EVER cache it in a register.告诉编译器这个值可以随时改变,因此它永远不应该将它缓存在寄存器中。 Look up the old "register" keyword in C. "Volatile" is basically the "-" operator to "register"'s "+".在 C 中查找旧的“register”关键字。“Volatile”基本上是“-”运算符来“注册”的“+”。 Modern compilers now do the optimization that "register" used to explicitly request by default, so you only see 'volatile' anymore.现代编译器现在会进行“注册”用于默认情况下显式请求的优化,因此您只能再看到“易失性”。 Using the volatile qualifier will guarantee that your processing never uses a stale value, but nothing more.使用 volatile 限定符将保证您的处理永远不会使用陈旧的值,仅此而已。

2) Atomic: 2)原子:
Atomic operations modify data in a single clock tick, so that it is impossible for ANY other thread to access the data in the middle of such an update.原子操作在单个时钟滴答中修改数据,因此任何其他线程都不可能在更新过程中访问数据。 They're usually limited to whatever single-clock assembly instructions the hardware supports;它们通常仅限于硬件支持的任何单时钟汇编指令; things like ++,--, and swapping 2 pointers.诸如 ++、-- 和交换 2 个指针之类的东西。 Note that this says nothing about the ORDER the different threads will RUN the atomic instructions, only that they will never run in parallel.请注意,这并没有说明不同线程将运行原子指令的 ORDER,只是它们永远不会并行运行。 That's why you have all those additional options for forcing an ordering.这就是为什么您拥有所有这些附加选项来强制订购的原因。

Volatile and Atomic serve different purposes. Volatile 和 Atomic 用于不同的目的。

Volatile : Informs the compiler to avoid optimization. Volatile :通知编译器避免优化。 This keyword is used for variables that shall change unexpectedly.此关键字用于将意外更改的变量。 So, it can be used to represent the Hardware status registers, variables of ISR, Variables shared in a multi-threaded application.因此,它可用于表示硬件状态寄存器、ISR 的变量、多线程应用程序中共享的变量。

Atomic : It is also used in case of multi-threaded application. Atomic :它也用于多线程应用程序的情况。 However, this ensures that there is no lock/stall while using in a multi-threaded application.但是,这可确保在多线程应用程序中使用时不会出现锁定/停止。 Atomic operations are free of races and indivisble.原子操作是没有种族和不可分割的。 Few of the key scenario of usage is to check whether a lock is free or used, atomically add to the value and return the added value etc. in multi-threaded application.使用的关键场景很少是在多线程应用程序中检查锁是否空闲或已使用,原子地添加值并返回添加的值等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM