有时可以使用 std::atomic 代替 C++ 中的 std::mutex 吗？

Question

I suppose that std::atomic sometimes can replace usages of std::mutex .我想std::atomic有时可以代替std::mutex的用法。 But is it always safe to use atomic instead of mutex?但是使用原子而不是互斥锁总是安全的吗？ Example code:示例代码：

std::atomic_flag f, ready; // shared

// ..... Thread 1 (and others) ....
while (true) {
    // ... Do some stuff in the beginning ...
    while (f.test_and_set()); // spin, acquire system lock
    if (ready.test()) {
        UseSystem(); // .... use our system for 50-200 nanoseconds ....
    }
    f.clear(); // release lock
    // ... Do some stuff at the end ...
}

// ...... Thread 2 .....
while (true) {
    // ... Do some stuff in the beginning ...
    InitSystem();
    ready.test_and_set(); // signify system ready
    // .... sleep for 10-30 milli-seconds ....
    while (f.test_and_set()); // acquire system lock
    ready.clear(); // signify system shutdown
    f.clear(); // release lock
    DeInitSystem(); // finalize/destroy system
    // ... Do some stuff at the end ...
}

Here I use std::atomic_flag to protect use of my system (some complex library).在这里，我使用std::atomic_flag来保护我的系统（一些复杂的库）的使用。 But is it safe code?但它是安全的代码吗？ Here I suppose that if ready is false then system is not available and I can't use it and if it is true then it is available and I can use it.在这里，我假设如果ready为false则系统不可用并且我无法使用它，如果它为 true 则它可用并且我可以使用它。 For simplicity suppose that code above doesn't throw exceptions.为简单起见，假设上面的代码不会引发异常。

Of cause I can use std::mutex to protect read/modify of my system.当然，我可以使用std::mutex来保护我的系统的读取/修改。 But right now I need very high performance code in Thread-1 that should use atomics very often instead of mutexes (Thread-2 can be slow and use mutexes if needed).但是现在我需要在 Thread-1 中使用非常高性能的代码，它应该经常使用原子而不是互斥锁（线程 2 可能很慢，如果需要，可以使用互斥锁）。

In Thread-1 system-usage code (inside while loop) is run very often, each iteration around 50-200 nano-seconds .在 Thread-1 系统使用代码（在 while 循环内）非常频繁地运行，每次迭代大约50-200 nano-seconds 。 So using extra mutexes will be to heavy.所以使用额外的互斥锁会很重。 But Thread-2 iterations are quite large, as you can see in each iteration of while loop when system is ready it sleeps for 10-30 milli-seconds , so using mutexes only in Thread-2 is quite alright.但是 Thread-2 迭代非常大，正如您在系统准备就绪时在 while 循环的每次迭代中看到的那样，它会休眠10-30 milli-seconds ，因此仅在 Thread-2 中使用互斥锁是完全可以的。

Thread-1 is example of one thread, there are several threads running same (or very similar) code as Thread-1 in my real project. Thread-1 是一个线程的示例，在我的实际项目中，有多个线程运行与 Thread-1 相同（或非常相似）的代码。

I'm concerned about memory operations ordering, meaning that it can probably happen somtimes that system is not yet in fully consistent state (not yet inited fully) when ready becomes true in Thread-1.我担心true操作顺序，这意味着当在 Thread-1 中ready时，系统可能还没有完全一致的 state（尚未完全启动）。 Also it may happen that ready becomes false in Thread-1 too late, when system already made some destroying (deinit) operations.此外，当系统已经进行了一些破坏（ false ）操作时，可能会在 Thread-1 中为时已晚ready 。 Also as you can see system can be inited/destroyed many times in a loop of Thread-2 and used many times in Thread-1 whenever it is ready .此外，正如您所见，系统可以在 Thread-2 的循环中多次启动/销毁，并在 Thread-1 ready时多次使用。

Can my task be solved somehow without std::mutex and other heavy stuff in Thread-1?如果没有 Thread-1 中的 std::mutex 和其他繁重的东西，我的任务能否以某种方式解决？ Only using std::atomic (or std::atomic_flag).仅使用 std::atomic（或 std::atomic_flag）。 Thread-2 can use heavy synchronization stuff if needed, mutexes etc.如果需要，Thread-2 可以使用大量同步的东西，互斥锁等。

Basically Thread-2 should somehow propagate whole inited state of system to all cores and other threads before ready becomes true and also Thread-2 should propagate ready equal to false before any single small operation of system destruction (deinit) is done.基本上，线程 2 应该以某种方式将系统的整个初始化 state 在ready变为true之前传播到所有内核和其他线程，并且线程 2 应该在系统破坏（deinit）的任何单个小操作完成之前传播ready等于false 。 By propagating state I mean that all system's inited data should be 100% written consistently to global memory and caches of other core, so that other threads see fully consistent system whenever ready is true .通过传播 state 我的意思是所有系统的初始化数据应该 100% 一致地写入全局 memory 和其他内核的缓存，以便其他线程在ready为true时看到完全一致的系统。

It is even allowed to make small (milliseconds) pause after system init and before ready is set to true if it improves situation and guarantees.如果它可以改善情况和保证，甚至可以在系统初始化之后和 ready 设置为 true 之前进行小（毫秒）暂停。 And also it is allowed to make pause after ready is set to false and before starting system destruction (deinit).并且还允许在 ready 设置为 false 之后和开始系统破坏 (deinit) 之前进行暂停。 Also doing some expensive CPU operations in Thread-2 is also alright if there exist some operations like "propagate all Thread-2 writes to global memory and caches to all other CPU cores and threads".如果存在一些操作，例如“将所有 Thread-2 写入传播到全局 memory 并将缓存传播到所有其他 CPU 内核和线程”，那么在 Thread-2 中执行一些昂贵的 CPU 操作也可以。

Update : As a solution for my question above right now in my project I decided to use next code with std::atomic_flag to replace std::mutex :更新：作为我现在在我的项目中的上述问题的解决方案，我决定使用带有std::atomic_flag的下一个代码来替换std::mutex ：

std::atomic_flag f = ATOMIC_FLAG_INIT; // shared
// .... Later in all threads ....
while (f.test_and_set(std::memory_order_acquire)) // try acquiring
    std::this_thread::yield();
shared_value += 5; // Any code, it is lock-protected.
f.clear(std::memory_order_release); // release

This solution above runs 9 nanoseconds on average (measured 2^25 operations) in single thread (release compiled) on my Windows 10 64-bit 2Ghz 2-core laptop.上述解决方案在我的 Windows 10 64 位 2Ghz 2 核笔记本电脑上的单线程（已编译版本）中平均运行9 nanoseconds （测量 2^25 次操作）。 While using std::unique_lock<std::mutex> lock(mux);使用std::unique_lock<std::mutex> lock(mux); for same protection purpose takes 100-120 nanoseconds on same Windows PC.出于相同的保护目的，在同一台 Windows PC 上需要100-120 nanoseconds 。 If it is needed for threads to spinlock instead of sleeping while waiting then instead of std::this_thread::yield();如果线程在等待时需要自旋锁定而不是休眠，则使用std::this_thread::yield(); in code above I just use semicolon ;在上面的代码中，我只使用分号; . . Full online example of usage and time measurements.使用和时间测量的完整在线示例。

Answer 1

I'll ignore your code for the sake of the answer, the answer generally is yes.为了答案，我将忽略您的代码，答案通常是肯定的。

A lock does the following things:锁做以下事情：

allows only one thread to acquire it at any given time在任何给定时间只允许一个线程获取它
when the lock is acquired, a read barrier is placed当获得锁时，会放置一个读屏障
right before the lock is released, a write barrier is placed就在释放锁之前，放置了写屏障

The combination of the 3 points above makes the critical section thread safe.以上 3 点的结合使临界区线程安全。 only one thread can touch the shared memory, all changes are observed by the locking thread because of the read barrier, and all the changes are to be visible to other locking threads, because of the write barrier.只有一个线程可以接触到共享的 memory，由于读屏障，所有更改都被锁定线程观察到，并且由于写屏障，所有更改都对其他锁定线程可见。

Can you use atomics to achieve it?你可以使用原子来实现它吗？ Yes, And real life locks (provided for example, by Win32/Posix) ARE implemented by either using atomics and lock free programming, either by using locks that use atomics and lock free programing.是的，现实生活中的锁（例如由 Win32/Posix 提供）是通过使用原子和无锁编程实现的，或者通过使用使用原子的锁和无锁编程来实现。

Now, realistically speaking, should you use a self-written lock instead of the standard locks?现在，实际上，您应该使用自写锁而不是标准锁吗？ Absolutely not.绝对不。

Many concurrency tutorials preserve the notion that spin-locks are "more efficient" than regular locks.许多并发教程保留了自旋锁比常规锁“更有效”的概念。 I can't stress enough how foolish it is.我怎么强调都不过分。 A user-mode spinlock IS NEVER more efficient than a lock that the OS provides.用户模式的自旋锁永远不会比操作系统提供的锁更有效。 The reason is simple, that OS locks are wired to the OS scheduler.原因很简单，操作系统锁连接到操作系统调度程序。 So if a lock tries to lock a lock and fails - the OS knows to freeze this thread and not reschedule it to run until the lock has been released.因此，如果一个锁试图锁定一个锁并且失败 - 操作系统知道冻结这个线程并且不会重新安排它运行直到锁被释放。

With user-mode spinlocks, this doesn't happen.使用用户模式自旋锁，这不会发生。 The OS can't know that the relevant thread tries to acquire to the lock in a tight loop.操作系统无法知道相关线程试图在紧密循环中获取锁。 Yielding is just a patch and not a solution - we want to spin for a short time, then go to sleep until the lock is released. Yielding 只是一个补丁而不是解决方案——我们想旋转一小段时间，然后 go 休眠直到锁被释放。 With user mode spin locks, we might waste the entire thread quantum trying to lock the spinlock and yielding.使用用户模式自旋锁，我们可能会浪费整个线程量子尝试锁定自旋锁并让步。

I will say, for the sake of honesty, that recent C++ standards do give us the ability to sleep on an atomic waiting for it to change its value.老实说，我要说的是，最近的 C++ 标准确实让我们能够睡在一个原子上，等待它改变它的值。 So we can, in a very lame way, implement our own "real" locks that try to spin for a while and then sleep until the lock is released.所以我们可以，以一种非常蹩脚的方式，实现我们自己的“真正的”锁，尝试旋转一段时间，然后休眠直到锁被释放。 However, implementing a correct and efficient lock when you're not a concurrency expert is pretty much impossible.但是，当您不是并发专家时，实现正确且高效的锁几乎是不可能的。

My own philosophical opinion that in 2021, developers should rarely deal with those very low-level concurrency topics.我个人的哲学观点是，在 2021 年，开发人员应该很少处理那些非常低级的并发主题。 Leave those things to the kernel guys.把这些东西留给 kernel 家伙。 Use some high level concurrency library and focus on the product you want to develop rather than micro-optimizing your code.使用一些高级并发库并专注于您想要开发的产品，而不是微优化您的代码。 This is concurrency, where correctness >>> efficiency.这是并发性，其中正确性 >>> 效率。

A related rant by Linus Torvalds Linus Torvalds 的相关咆哮

有时可以使用 std::atomic 代替 C++ 中的 std::mutex 吗？

问题描述

1 个解决方案

解决方案1
6 已采纳 2021-02-14 11:01:02

有时可以使用 std::atomic 代替 C++ 中的 std::mutex 吗？

问题描述

1 个解决方案

解决方案1 6 已采纳 2021-02-14 11:01:02

解决方案1
6 已采纳 2021-02-14 11:01:02