简体   繁体   English

std::atomic 究竟是什么?

[英]What exactly is std::atomic?

I understand that std::atomic<> is an atomic object.我知道std::atomic<>是一个原子对象。 But atomic to what extent?但是原子到什么程度呢? To my understanding an operation can be atomic.据我了解,操作可以是原子的。 What exactly is meant by making an object atomic?使对象原子化到底是什么意思? For example if there are two threads concurrently executing the following code:例如,如果有两个线程并发执行以下代码:

a = a + 12;

Then is the entire operation (say add_twelve_to(int) ) atomic?那么整个操作(比如add_twelve_to(int) )是原子的吗? Or are changes made to the variable atomic (so operator=() )?或者是否对变量 atomic 进行了更改(所以operator=() )?

Each instantiation and full specialization of std::atomic<> represents a type that different threads can simultaneously operate on (their instances), without raising undefined behavior: std::atomic<> 的每个实例化和完全特化都表示不同线程可以同时操作(它们的实例)的类型,而不会引发未定义的行为:

Objects of atomic types are the only C++ objects that are free from data races;原子类型的对象是唯一没有数据竞争的 C++ 对象; that is, if one thread writes to an atomic object while another thread reads from it, the behavior is well-defined.也就是说,如果一个线程写入一个原子对象而另一个线程从它读取,则行为是明确定义的。

In addition, accesses to atomic objects may establish inter-thread synchronization and order non-atomic memory accesses as specified by std::memory_order .此外,对原子对象的访问可以建立线程间同步并按照std::memory_order指定的顺序对非原子内存访问进行std::memory_order

std::atomic<> wraps operations that, in pre-C++ 11 times, had to be performed using (for example) interlocked functions with MSVC or atomic bultins in case of GCC. std::atomic<>包装了在 C++ 之前的 11 次中必须使用(例如) 互锁函数与 MSVC 或在 GCC 的情况下的原子 bultins 执行的操作。

Also, std::atomic<> gives you more control by allowing various memory orders that specify synchronization and ordering constraints.此外, std::atomic<>通过允许指定同步和排序约束的各种内存顺序为您提供更多控制。 If you want to read more about C++ 11 atomics and memory model, these links may be useful:如果您想阅读有关 C++ 11 原子和内存模型的更多信息,这些链接可能有用:

Note that, for typical use cases, you would probably use overloaded arithmetic operators or another set of them :请注意,对于典型用例,您可能会使用 重载算术运算符另一组 运算符

std::atomic<long> value(0);
value++; //This is an atomic op
value += 5; //And so is this

Because operator syntax does not allow you to specify the memory order, these operations will be performed with std::memory_order_seq_cst , as this is the default order for all atomic operations in C++ 11. It guarantees sequential consistency (total global ordering) between all atomic operations.因为运算符语法不允许您指定内存顺序,这些操作将使用std::memory_order_seq_cst执行,因为这是 C++ 11 中所有原子操作的默认顺序。它保证所有原子之间的顺序一致性(总全局排序)操作。

In some cases, however, this may not be required (and nothing comes for free), so you may want to use more explicit form:然而,在某些情况下,这可能不是必需的(并且没有什么是免费的),因此您可能需要使用更明确的形式:

std::atomic<long> value {0};
value.fetch_add(1, std::memory_order_relaxed); // Atomic, but there are no synchronization or ordering constraints
value.fetch_add(5, std::memory_order_release); // Atomic, performs 'release' operation

Now, your example:现在,你的例子:

a = a + 12;

will not evaluate to a single atomic op: it will result in a.load() (which is atomic itself), then addition between this value and 12 and a.store() (also atomic) of final result.不会评估为单个原子操作:它将导致a.load() (它本身是原子的),然后在此值与12和最终结果的a.store() (也是原子的)之间a.store() As I noted earlier, std::memory_order_seq_cst will be used here.正如我之前提到的,这里将使用std::memory_order_seq_cst

However, if you write a += 12 , it will be an atomic operation (as I noted before) and is roughly equivalent to a.fetch_add(12, std::memory_order_seq_cst) .然而,如果你写a += 12 ,它将是一个原子操作(正如我之前提到的)并且大致相当于a.fetch_add(12, std::memory_order_seq_cst)

As for your comment:至于你的评论:

A regular int has atomic loads and stores.常规int具有原子加载和存储。 Whats the point of wrapping it with atomic<> ?atomic<>包装它有什么意义?

Your statement is only true for architectures that provide such guarantee of atomicity for stores and/or loads.您的陈述仅适用于为存储和/或加载提供这种原子性保证的架构。 There are architectures that do not do this.有些架构不这样做。 Also, it is usually required that operations must be performed on word-/dword-aligned address to be atomic std::atomic<> is something that is guaranteed to be atomic on every platform, without additional requirements.此外,通常要求必须在字/双字对齐地址上执行操作才能成为原子性std::atomic<>是保证在每个平台上都是原子的,无需额外要求。 Moreover, it allows you to write code like this:此外,它允许您编写如下代码:

void* sharedData = nullptr;
std::atomic<int> ready_flag = 0;

// Thread 1
void produce()
{
    sharedData = generateData();
    ready_flag.store(1, std::memory_order_release);
}

// Thread 2
void consume()
{
    while (ready_flag.load(std::memory_order_acquire) == 0)
    {
        std::this_thread::yield();
    }

    assert(sharedData != nullptr); // will never trigger
    processData(sharedData);
}

Note that assertion condition will always be true (and thus, will never trigger), so you can always be sure that data is ready after while loop exits.请注意,断言条件将始终为真(因此永远不会触发),因此您始终可以确保在while循环退出后数据已准备就绪。 That is because:那是因为:

  • store() to the flag is performed after sharedData is set (we assume that generateData() always returns something useful, in particular, never returns NULL ) and uses std::memory_order_release order: store()在设置sharedData之后执行(我们假设generateData()总是返回一些有用的东西,特别是,从不返回NULL )并使用std::memory_order_release顺序:

memory_order_release

A store operation with this memory order performs the release operation: no reads or writes in the current thread can be reordered after this store.具有此内存顺序的存储操作执行释放操作:此存储之后无法重新排序当前线程中的读取或写入。 All writes in the current thread are visible in other threads that acquire the same atomic variable当前线程中的所有写入在获取相同原子变量的其他线程中都是可见的

  • sharedData is used after while loop exits, and thus after load() from flag will return a non-zero value. sharedDatawhile循环退出后使用,因此在load() from flag 之后将返回一个非零值。 load() uses std::memory_order_acquire order: load()使用std::memory_order_acquire顺序:

std::memory_order_acquire

A load operation with this memory order performs the acquire operation on the affected memory location: no reads or writes in the current thread can be reordered before this load.具有此内存顺序的加载操作对受影响的内存位置执行获取操作:此加载之前,当前线程中的读取或写入操作不能重新排序。 All writes in other threads that release the same atomic variable are visible in the current thread .释放相同原子变量的其他线程中的所有写入在当前线程中都是可见的

This gives you precise control over the synchronization and allows you to explicitly specify how your code may/may not/will/will not behave.这使您可以精确控制同步,并允许您明确指定您的代码可能/可能不会/将/不会/不会的行为。 This would not be possible if only guarantee was the atomicity itself.如果仅保证原子性本身,这将是不可能的。 Especially when it comes to very interesting sync models like the release-consume ordering .尤其是当涉及到非常有趣的同步模型时,例如release-consume ordering

std::atomic exists because many ISAs have direct hardware support for it std::atomic存在是因为许多 ISA 都有直接的硬件支持

What the C++ standard says about std::atomic has been analyzed in other answers.在其他答案中分析了 C++ 标准对std::atomic看法。

So now let's see what std::atomic compiles to to get a different kind of insight.所以现在让我们看看std::atomic编译成什么以获得不同的洞察力。

The main takeaway from this experiment is that modern CPUs have direct support for atomic integer operations, for example the LOCK prefix in x86, and std::atomic basically exists as a portable interface to those intructions: What does the "lock" instruction mean in x86 assembly?这个实验的主要内容是现代 CPU 直接支持原子整数运算,例如 x86 中的 LOCK 前缀,而std::atomic基本上作为这些指令的可移植接口存在: “lock”指令在x86 汇编? In aarch64, LDADD would be used.在 aarch64 中,将使用LDADD

This support allows for faster alternatives to more general methods such as std::mutex , which can make more complex multi-instruction sections atomic, at the cost of being slower than std::atomic because std::mutex it makes futex system calls in Linux, which is way slower than the userland instructions emitted by std::atomic , see also: Does std::mutex create a fence?这种支持允许更快地替代更通用的方法,例如std::mutex ,它可以使更复杂的多指令部分原子化,代价是比std::atomic慢,因为std::mutex它使futex系统调用Linux 比std::atomic发出的用户态指令慢得多,另请参阅: Std::mutex 是否创建了栅栏?

Let's consider the following multi-threaded program which increments a global variable across multiple threads, with different synchronization mechanisms depending on which preprocessor define is used.让我们考虑以下多线程程序,它跨多个线程递增全局变量,根据使用的预处理器定义具有不同的同步机制。

main.cpp主程序

#include <atomic>
#include <iostream>
#include <thread>
#include <vector>

size_t niters;

#if STD_ATOMIC
std::atomic_ulong global(0);
#else
uint64_t global = 0;
#endif

void threadMain() {
    for (size_t i = 0; i < niters; ++i) {
#if LOCK
        __asm__ __volatile__ (
            "lock incq %0;"
            : "+m" (global),
              "+g" (i) // to prevent loop unrolling
            :
            :
        );
#else
        __asm__ __volatile__ (
            ""
            : "+g" (i) // to prevent he loop from being optimized to a single add
            : "g" (global)
            :
        );
        global++;
#endif
    }
}

int main(int argc, char **argv) {
    size_t nthreads;
    if (argc > 1) {
        nthreads = std::stoull(argv[1], NULL, 0);
    } else {
        nthreads = 2;
    }
    if (argc > 2) {
        niters = std::stoull(argv[2], NULL, 0);
    } else {
        niters = 10;
    }
    std::vector<std::thread> threads(nthreads);
    for (size_t i = 0; i < nthreads; ++i)
        threads[i] = std::thread(threadMain);
    for (size_t i = 0; i < nthreads; ++i)
        threads[i].join();
    uint64_t expect = nthreads * niters;
    std::cout << "expect " << expect << std::endl;
    std::cout << "global " << global << std::endl;
}

GitHub upstream . GitHub 上游

Compile, run and disassemble:编译、运行和反汇编:

comon="-ggdb3 -O3 -std=c++11 -Wall -Wextra -pedantic main.cpp -pthread"
g++ -o main_fail.out                    $common
g++ -o main_std_atomic.out -DSTD_ATOMIC $common
g++ -o main_lock.out       -DLOCK       $common

./main_fail.out       4 100000
./main_std_atomic.out 4 100000
./main_lock.out       4 100000

gdb -batch -ex "disassemble threadMain" main_fail.out
gdb -batch -ex "disassemble threadMain" main_std_atomic.out
gdb -batch -ex "disassemble threadMain" main_lock.out

Extremely likely "wrong" race condition output for main_fail.out : main_fail.out极有可能的“错误”竞争条件输出:

expect 400000
global 100000

and deterministic "correct" output of the others:和其他人的确定性“正确”输出:

expect 400000
global 400000

Disassembly of main_fail.out : main_fail.out反汇编:

   0x0000000000002780 <+0>:     endbr64 
   0x0000000000002784 <+4>:     mov    0x29b5(%rip),%rcx        # 0x5140 <niters>
   0x000000000000278b <+11>:    test   %rcx,%rcx
   0x000000000000278e <+14>:    je     0x27b4 <threadMain()+52>
   0x0000000000002790 <+16>:    mov    0x29a1(%rip),%rdx        # 0x5138 <global>
   0x0000000000002797 <+23>:    xor    %eax,%eax
   0x0000000000002799 <+25>:    nopl   0x0(%rax)
   0x00000000000027a0 <+32>:    add    $0x1,%rax
   0x00000000000027a4 <+36>:    add    $0x1,%rdx
   0x00000000000027a8 <+40>:    cmp    %rcx,%rax
   0x00000000000027ab <+43>:    jb     0x27a0 <threadMain()+32>
   0x00000000000027ad <+45>:    mov    %rdx,0x2984(%rip)        # 0x5138 <global>
   0x00000000000027b4 <+52>:    retq

Disassembly of main_std_atomic.out : main_std_atomic.out反汇编:

   0x0000000000002780 <+0>:     endbr64 
   0x0000000000002784 <+4>:     cmpq   $0x0,0x29b4(%rip)        # 0x5140 <niters>
   0x000000000000278c <+12>:    je     0x27a6 <threadMain()+38>
   0x000000000000278e <+14>:    xor    %eax,%eax
   0x0000000000002790 <+16>:    lock addq $0x1,0x299f(%rip)        # 0x5138 <global>
   0x0000000000002799 <+25>:    add    $0x1,%rax
   0x000000000000279d <+29>:    cmp    %rax,0x299c(%rip)        # 0x5140 <niters>
   0x00000000000027a4 <+36>:    ja     0x2790 <threadMain()+16>
   0x00000000000027a6 <+38>:    retq   

Disassembly of main_lock.out : main_lock.out反汇编:

Dump of assembler code for function threadMain():
   0x0000000000002780 <+0>:     endbr64 
   0x0000000000002784 <+4>:     cmpq   $0x0,0x29b4(%rip)        # 0x5140 <niters>
   0x000000000000278c <+12>:    je     0x27a5 <threadMain()+37>
   0x000000000000278e <+14>:    xor    %eax,%eax
   0x0000000000002790 <+16>:    lock incq 0x29a0(%rip)        # 0x5138 <global>
   0x0000000000002798 <+24>:    add    $0x1,%rax
   0x000000000000279c <+28>:    cmp    %rax,0x299d(%rip)        # 0x5140 <niters>
   0x00000000000027a3 <+35>:    ja     0x2790 <threadMain()+16>
   0x00000000000027a5 <+37>:    retq

Conclusions:结论:

  • the non-atomic version saves the global to a register, and increments the register.非原子版本将全局保存到寄存器,并递增寄存器。

    Therefore, at the end, very likely four writes happen back to global with the same "wrong" value of 100000 .因此,最后,很可能四次写入发生在全局相同的“错误”值100000

  • std::atomic compiles to lock addq . std::atomic编译为lock addq The LOCK prefix makes the following inc fetch, modify and update memory atomically. LOCK 前缀使以下inc以原子方式获取、修改和更新内存。

  • our explicit inline assembly LOCK prefix compiles to almost the same thing as std::atomic , except that our inc is used instead of add .我们的显式内联程序集 LOCK 前缀编译为与std::atomic几乎相同的东西,除了使用我们的inc而不是add Not sure why GCC chose add , considering that our INC generated a decoding 1 byte smaller.不知道为什么 GCC 选择add ,考虑到我们的 INC 生成的解码小 1 个字节。

ARMv8 could use either LDAXR + STLXR or LDADD in newer CPUs: How do I start threads in plain C? ARMv8 可以在较新的 CPU 中使用 LDAXR + STLXR 或 LDADD: 如何在普通 C 中启动线程?

Tested in Ubuntu 19.10 AMD64, GCC 9.2.1, Lenovo ThinkPad P51.在 Ubuntu 19.10 AMD64、GCC 9.2.1、联想 ThinkPad P51 中测试。

I understand that std::atomic<> makes an object atomic.我知道std::atomic<>使对象原子化。

That's a matter of perspective... you can't apply it to arbitrary objects and have their operations become atomic, but the provided specialisations for (most) integral types and pointers can be used.这是一个视角问题……您不能将其应用于任意对象并使它们的操作成为原子操作,但是可以使用为(大多数)整型和指针提供的特化。

a = a + 12;

std::atomic<> does not (use template expressions to) simplify this to a single atomic operation, instead the operator T() const volatile noexcept member does an atomic load() of a , then twelve is added, and operator=(T t) noexcept does a store(t) . std::atomic<>不(使用模板表达式)简化这对单个原子操作,而不是operator T() const volatile noexcept构件确实原子load()a ,那么12被添加,并且operator=(T t) noexcept执行store(t)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM