简体   繁体   English

为什么我的 std::atomic<int> 变量不是线程安全的?</int>

[英]Why my std::atomic<int> variable isn't thread-safe?

I don't know why my code isn't thread-safe, as it outputs some inconsistent results.我不知道为什么我的代码不是线程安全的,因为它输出了一些不一致的结果。

value 48
value 49
value 50
value 54
value 51
value 52
value 53

My understanding of an atomic object is it prevents its intermediate state from exposing, so it should solve the problem when one thread is reading it and the other thread is writing it.我对原子 object 的理解是它可以防止其中间 state 暴露,因此它应该解决一个线程正在读取它而另一个线程正在写入它时的问题。

I used to think I could use std::atomic without a mutex to solve the multi-threading counter increment problem, and it didn't look like the case.我曾经以为我可以使用没有互斥体的 std::atomic 来解决多线程计数器增量问题,结果看起来并非如此。

I probably misunderstood what an atomic object is, Can someone explain?我可能误解了原子 object 是什么,有人可以解释一下吗?

void
inc(std::atomic<int>& a)
{
  while (true) {
    a = a + 1;
    printf("value %d\n", a.load());
    std::this_thread::sleep_for(std::chrono::milliseconds(1000));
  }
}

int
main()
{
  std::atomic<int> a(0);
  std::thread t1(inc, std::ref(a));
  std::thread t2(inc, std::ref(a));
  std::thread t3(inc, std::ref(a));
  std::thread t4(inc, std::ref(a));
  std::thread t5(inc, std::ref(a));
  std::thread t6(inc, std::ref(a));

  t1.join();
  t2.join();
  t3.join();
  t4.join();
  t5.join();
  t6.join();
  return 0;
}

I used to think I could use std::atomic without a mutex to solve the multi-threading counter increment problem, and it didn't look like the case.我曾经以为我可以使用没有互斥体的 std::atomic 来解决多线程计数器增量问题,结果看起来并非如此。

You can, just not the way you have coded it.你可以,只是不是你编码的方式。 You have to think about where the atomic accesses occur.您必须考虑原子访问发生的位置。 Consider this line of code …考虑这行代码......

a = a + 1;
  1. First the value of a is fetched atomically.首先,原子地获取a的值。 Let's say the value fetched is 50.假设获取的值为 50。
  2. We add one to that value getting 51.我们在该值上加一,得到 51。
  3. Finally we atomically store that value into a using the = operator最后,我们使用=运算符将该值原子地存储到a
  4. a ends up being 51 a最终是 51
  5. We atomically load the value of a by calling a.load()我们通过调用a.load()原子地加载a的值
  6. We print the value we just loaded by calling printf()我们通过调用 printf() 打印我们刚刚加载的值

So far so good.到目前为止,一切都很好。 But between steps 1 and 3 some other threads may have changed the value of a - for example to the value 54. So, when step 3 stores 51 into a it overwrites the value 54 giving you the output you see.但是在第 1 步和第 3 步之间,其他一些线程可能已经更改了a的值 - 例如更改为值 54。因此,当第 3 步将 51 存储到a时,它会覆盖值 54,从而为您提供您看到的 output。

As @Sopel and @Shawn suggest in the comments, you can atomically increment the value in a using one of the appropriate functions (like fetch_add ) or operator overloads (like operator ++ or operator += . See the std::atomic documentation for details正如@Sopel 和@Shawn 在评论中所建议的那样,您可以使用适当的函数之一(如fetch_add )或运算符重载(如operator ++operator +=以原子方式递增a中的值。请参阅std::atomic 文档了解细节

Update更新

I added steps 5 and 6 above.我在上面添加了步骤 5 和 6。 Those steps can also lead to results that may not look correct.这些步骤也可能导致看起来不正确的结果。

Between the store at step 3. and the call tp a.load() at step 5. other threads can modify the contents of a .在步骤 3 的存储和步骤 5 的调用 tp a.load()之间。其他线程可以修改a的内容。 After our thread stores 51 in a at step 3 it may find that a.load() returns some different number at step 5. Thus the thread that set a to the value 51 may not pass the value 51 to printf() .在我们的线程在第 3 步将 51 存储到a之后,它可能会发现a.load()在第 5 步返回了一些不同的数字。因此,将a设置为值 51 的线程可能不会将值 51 传递给printf()

Another source of problems is that nothing coordinates the execution of steps 5. and 6. between two threads.问题的另一个来源是没有任何东西可以协调两个线程之间第 5 步和第 6 步的执行。 So, for example, imagine two threads X and Y running on a single processor.因此,例如,想象两个线程 X 和 Y 在单个处理器上运行。 One possible execution order might be this …一种可能的执行顺序可能是……

  1. Thread X executes steps 1 through 5 above incrementing a from 50 to 51 and getting the value 51 back from a.load()线程 X 执行上面的步骤 1 到 5, a从 50 递增到 51 并从a.load()取回值 51
  2. Thread Y executes steps 1 through 5 above incrementing a from 51 to 52 and getting the value 52 back from a.load()线程 Y 执行上面的步骤 1 到 5, a从 51 增加到 52 并从a.load()中取回值 52
  3. Thread Y executes printf() sending 52 to the console线程 Y 执行printf()将 52 发送到控制台
  4. Thread X executes printf() sending 51 to the console线程 X 执行printf()发送 51 到控制台

We've now printed 52 on the console, followed by 51.我们现在在控制台上打印了 52,然后是 51。

Finally, there's another problem lurking at step 6. because printf() doesn't make any promises about what happens if two threads call printf() at the same time (at least I don't think it does).最后,在第 6 步还潜伏着另一个问题。因为printf()没有对如果两个线程同时调用printf()会发生什么做出任何承诺(至少我认为它不会)。

On a multiprocessor system threads X and Y above might call printf() at exactly the same moment (or within a few ticks of exactly the same moment) on two different processors.在多处理器系统上,上面的线程 X 和 Y 可能在两个不同处理器上的完全相同的时刻(或完全相同的时刻的几个滴答声内)调用printf() We can't make any prediction about which printf() output will appear first on the console.我们无法预测哪个printf() output 将首先出现在控制台上。

Note The documentation for printf mentions a lock introduced in C++17 "… used to prevent data races when multiple threads read, write, position, or query the position of a stream." Note The documentation for printf mentions a lock introduced in C++17 "… used to prevent data races when multiple threads read, write, position, or query the position of a stream." In the case of two threads simultaneously contending for that lock we still can't tell which one will win.在两个线程同时争夺该锁的情况下,我们仍然无法判断哪一个会赢。

Besides the increment of a being done non-atomically, the fetch of the value to display after the increment is non-atomic with respect to the increment.除了非原子地完成a的增量之外,在增量之后显示的值的获取相对于增量而言是非原子的。 It is possible that one of the other threads increments a after the current thread has incremented it but before the fetch of the value to display.在当前线程增加它之后但在获取要显示a值之前,其他线程之一可能会增加它。 This would possibly result in the same value being shown twice, with the previous value skipped.这可能会导致相同的值显示两次,而前一个值会被跳过。

Another issue here is that the threads do not necessarily run in the order they have been created.这里的另一个问题是线程不一定按照它们创建的顺序运行。 Thread 7 could execute its output before threads 4, 5, and 6, but after all four threads have incremented a .线程 7 可以在线程 4、5 和 6 之前执行其 output,但在所有四个线程都增加a之后。 Since the thread that did the last increment displays its output earlier, you end up with the output not being sequential.由于执行最后一个增量的线程之前显示其 output,因此您最终会得到 output 不是连续的。 This is more likely to happen on a system with fewer than six hardware threads available to run on.这更有可能发生在可运行的硬件线程少于六个的系统上。

Adding a small sleep between the various thread creates (eg, sleep_for(10) ) would make this less likely to occur, but would still not eliminate the possibility.在创建的各种线程之间添加一个小睡眠(例如sleep_for(10) )会降低这种情况发生的可能性,但仍不会消除这种可能性。 The only sure way to keep the output ordered is to use some sort of exclusion (like a mutex) to ensure only one thread has access to the increment and output code, and treat both the increment and output code as a single transaction that must run together before another thread tries to do an increment.保持 output 有序的唯一可靠方法是使用某种排除(如互斥锁)以确保只有一个线程可以访问增量和 output 代码,并将增量和 Z78E6221F6393D1356F681CE 代码都视为必须运行的单个事务在另一个线程尝试进行增量之前一起。

The other answers point out the non-atomic increment and various problems.其他答案指出了非原子增量和各种问题。 I mostly want to point out some interesting practical details about exactly what we see when running this code on a real system.我主要想指出一些有趣的实用细节,关于我们在真实系统上运行此代码时所看到的确切内容。 (x86-64 Arch Linux, gcc9.1 -O3, i7-6700k 4c8t Skylake). (x86-64 Arch Linux,gcc9.1 -O3,i7-6700k 4c8t Skylake)。

It can be useful to understand why certain bugs or design choices lead to certain behaviours, for troubleshooting / debugging.了解为什么某些错误或设计选择会导致某些行为对于故障排除/调试很有用。


Use int tmp = ++a;使用int tmp = ++a; to capture the fetch_add result in a local variable instead of reloading it from the shared variable .在局部变量中捕获 fetch_add 结果,而不是从共享变量中重新加载它 (And as 1202ProgramAlarm says, you might want to treat the whole increment and print as an atomic transaction if you insist on having your counts printed in order as well as being done properly.) (正如 1202ProgramAlarm 所说,如果您坚持按顺序打印计数并正确完成,您可能希望将整个增量和打印视为原子事务。)

Or you might want to have each thread record the values it saw in a private data structure to be printed later, instead of also serializing threads with printf during the increments.或者您可能希望每个线程记录它在私有数据结构中看到的值以便稍后打印,而不是在增量期间使用printf序列化线程。 (In practice all trying to increment the same atomic variable will serialize them waiting for access to the cache line; ++a will go in order so you can tell from the modification order which thread went in which order.) (实际上,所有尝试增加相同原子变量的操作都会将它们序列化以等待访问缓存行; ++a将按顺序排列 go,因此您可以从修改顺序中看出哪个线程按哪个顺序执行。)


Fun fact: a.store(1 + a.load(std:memory_order_relaxed), std::memory_order_release) is what you might do for a variable that was only written by 1 thread, but read by multiple threads.有趣的事实: a.store(1 + a.load(std:memory_order_relaxed), std::memory_order_release)是您可能对仅由 1 个线程写入但由多个线程读取的变量执行的操作。 You don't need an atomic RMW because no other thread ever modifies it.您不需要原子 RMW,因为没有其他线程修改过它。 You just need a thread-safe way to publish updates.您只需要一种线程安全的方式来发布更新。 (Or better, in a loop keep a local counter and just .store() it without loading from the shared variable.) (或者更好的是,在一个循环中保留一个本地计数器,只需.store()它而不从共享变量中加载。)

If you used the default a =... for a sequentially-consistent store, you might as well have done an atomic RMW on x86.如果您使用默认a =...作为顺序一致的存储,那么您可能已经在 x86 上完成了原子 RMW。 One good way to compile that is with an atomic xchg , or mov + mfence is as expensive (or more).使用原子xchgmov + mfence进行编译的一种好方法同样昂贵(或更多)。


What's interesting is that despite the massive problems with your code, no counts were lost or stepped on (no duplicate counts), merely printing reordered.有趣的是,尽管您的代码存在大量问题,但没有丢失或踩踏任何计数(没有重复计数),只是重新排序打印。 So in practice the danger wasn't encountered because of other effects going on.因此,在实践中,由于其他影响正在发生,因此没有遇到危险。

I tried it on my own machine and did lose some counts.我在自己的机器上试了一下,确实输了一些计数。 But after removing the sleep, I just got reordering.但是在取消睡眠后,我刚刚重新排序。 (I copy-pasted about 1000 lines of the output into a file, and sort -u to uniquify the output didn't change the line count. It did move some late prints around though; presumably one thread got stalled for a while.) My testing didn't check for the possibility of lost counts, skipped by not saving the value being stored into a , and instead reloading it. (我将 output 的大约 1000 行复制粘贴到一个文件中,并sort -u以使 output 唯一化并没有改变行数。它确实移动了一些后期打印;大概一个线程停滞了一段时间。)我的测试没有检查丢失计数的可能性,而是通过不将存储的值保存到a中而跳过,而是重新加载它。 I'm not sure there's a plausible way for that to happen here without multiple threads reading the same count, which would be detected.我不确定在没有多个线程读取相同计数的情况下发生这种情况的合理方式,这会被检测到。

Store + reload, even a seq-cst store which has to flush the store buffer before it can reload, is very fast compared to printf making a write() system call.printf进行write()系统调用相比,存储 + 重新加载,即使是在重新加载之前必须刷新存储缓冲区的 seq-cst 存储也非常快。 (The format string includes a newline and I didn't redirect output to a file so stdout is line-buffered and can't just append the string to a buffer.) (格式字符串包含换行符,我没有将 output 重定向到文件,因此 stdout 是行缓冲的,不能只是 append 字符串到缓冲区。)

( write() system calls on the same file descriptor are serializing in POSIX: write(2) is atomic. Also, printf(3) itself is thread-safe on GNU/Linux, as required by C++17, and probably by POSIX long before that.) (相同文件描述符上的write()系统调用在 POSIX 中序列化: write(2)是原子的。此外, printf(3)本身在 GNU/Linux 上是线程安全的,这是 C++17 所要求的,并且可能是 POSIX 很久以前所要求的那。)

Stdio locking in printf happens to be enough serialization in almost all cases: the thread that just unlocked stdout and left printf can do the atomic increment and then try to take the stdout lock again.在几乎所有情况下, printf中的 stdio 锁定恰好足够序列化:刚刚解锁 stdout 并离开 printf 的线程可以执行原子增量,然后尝试再次获取 stdout 锁定。

The other threads were all blocked trying to take the lock on stdout.其他线程都被阻止试图锁定标准输出。 One (other?) thread can wake up and take the lock on stdout, but for its increment to race with the other thread it would have to enter and leave printf and load a the first time before that other thread commits its a =... seq-cst store.一个(其他?)线程可以唤醒并在 stdout 上获得锁,但是为了使其增量与其他线程竞争,它必须进入和离开 printf 并在其他线程提交它之前第一次加载a a =... .seq-cst 存储a =...

This does not mean it's actually safe这并不意味着它实际上是安全的

Just that testing this specific version of the program (at least on x86) doesn't easily reveal the lack of safety.只是测试这个特定版本的程序(至少在 x86 上)并不能轻易揭示缺乏安全性。 Interrupts or scheduling variations, including competition from other things running on the same machine, certainly could block a thread at just the wrong time.中断或调度变化,包括来自同一台机器上运行的其他事物的竞争,肯定会在错误的时间阻塞线程。

My desktop has 8 logical cores so there were enough for every thread to get one, not having to get descheduled.我的桌面有 8 个逻辑核心,因此每个线程都可以拥有一个,而不必取消调度。 (Although normally that would tend to happen on I/O or when waiting on a lock anyway). (尽管通常这往往会发生在 I/O 或等待锁时)。


With the sleep there, it is not unlikely for multiple threads to wake up at nearly the same time and race with each other in practice on real x86 hardware.有了sleep ,多个线程几乎同时唤醒并在实际 x86 硬件上相互竞争的可能性不大。 It's so long that timer granularity becomes a factor, I think.我认为,计时器粒度成为一个因素的时间太长了。 Or something like that.或类似的东西。


Redirecting output to a file将 output 重定向到文件

With stdout open on a non-TTY file, it's full-buffered instead of line-buffered , and doesn't always make a system call while holding the stdout lock.在非 TTY 文件上打开stdout时,它是全缓冲而不是 line-buffered ,并且在持有 stdout 锁时并不总是进行系统调用。

(I got a 17MiB file in /tmp from hitting control-C a fraction of a second after running ./a.out > output .) (在运行./a.out > output后,我在 /tmp 中获得了一个 17MiB 的文件,因为我在几分之一秒内点击了 control-C。)

This makes it fast enough for threads to actually race with each other in practice, showing the expected bugs of duplicate values.这使得线程在实践中实际相互竞争的速度足够快,从而显示出重复值的预期错误。 (A thread reads a but loses ownership of the cache line before it stores (tmp)+1 , resulting in two or more threads doing the same increment. And/or multiple threads reading the same value when they reload a after flushing their store buffer.) (线程读取a但在存储(tmp)+1之前失去了缓存行的所有权,导致两个或多个线程执行相同的增量。和/或多个线程在刷新其存储缓冲区后重新加载a时读取相同的值.)

1228589 unique lines ( sort -u | wc ) but total output of 1228589唯一行( sort -u | wc )但总 output 的
1291035 total lines.总行数1291035 So ~5% of the output lines were duplicates.所以大约 5% 的 output 行是重复的。

I didn't check if it was usually one value duplicated multiple times or if it was usually only one duplicate.我没有检查它是否通常是一个重复多次的值,或者通常只有一个重复。 Or how far backward the value ever jumped.或者价值曾经跳跃了多远。 If a thread happened to be stalled by an interrupt handler after loading but before storing val+1 , it could be quite far.如果一个线程在加载之后但在存储val+1之前碰巧被中断处理程序停止,它可能会很远。 Or if it actually slept or blocked for some reason, it could rewind indefinitely far.或者,如果它由于某种原因实际上处于睡眠状态或阻塞状态,它可能会无限期地倒带很远。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM