I'm studying the difference between mutex
and atomic
in C++11.
As my understanding, mutex
is a kind of lock-mechanism, which is implemented based on the OS/kernel. For example, Linux offers a mechanism, which is futex
. With the help of futex
, we could implement mutex
and semaphore
. Furthermore, I've known that futex
is implemented by the low-level atomic operation, such as CompareAndSet
, CompareAndSwap
.
For std::atomic
, I've known that it is implemented based on the memory model, which is introduced by C++11. However, I don't know how the memory model is implemented at the low level. If it is also implemented by the atomic operation like CompareAndSet
, what is the difference between std::atomic
and mutex
?
In a word, if std::atomic::is_lock_free
gives me a false
, well, I'm gonna say that std::atomic
is the same with mutex
. But if it gives me a true
, how is it implemented at low level?
If atomic operations are lock_free
, they're probably implemented the same way the components of a mutex are implemented. After all, to lock a mutex, you really want some kind of atomic operation to ensure that one and only one thread locks the mutex.
The difference is, atomic operations that are lock free don't have a "locked" state. Let's compare two possible ways to do an atomic increment of a variable:
First, the mutex way. We lock a mutex, we read-increment-write the variable, then we unlock the mutex. If the thread gets interrupted during the read-increment-write, other threads that attempt to perform this same operation will block trying to lock the mutex. (See Where is the lock for a std::atomic? for how this works on some real implementations, for objects too large to be lock_free.)
Second, the atomic way. The CPU "locks" just the cache line holding the variable we want to modify for the duration of a single read-increment-write instruction. (This means the CPU delays responding to MESI requests to invalidate or share the cache line, keeping exclusive access so no other CPU can look at it. MESI cache coherency always requires exclusive ownership of a cache line before a core can modify it so this is cheap if we already owned the line). It is not possible for us to get interrupted during an instruction. Another thread that attempts to access this variable, at worst, has to wait for the cache coherency hardware to sort out who can modify the memory location.
So how do we lock a mutex? Likely we perform an atomic compare-and-swap. So light atomic operations are the primitives from which heavy mutex operations are assembled.
Of course this is all platform-specific. But this is what typical, modern platforms that you are likely to use do.
what is the difference between
std::atomic
and mutex
A mutex is a concurrency construct, independent from any user data, offering lock
and unlock
method allowing you to protect (enforce mutual exclusion within) a region of code. You can put whatever you want in that region.
std::atomic<T>
is an adapter over a single instance of a type T , allowing atomic access on a per operation basis to that object.
A mutex is more general in the sense that one possible implementation of std::atomic
is to protect all access to the underlying object with a mutex.
std::atomic
exists mostly because of the other common implementation: using an atomic instruction 2 to execute the operation directly without requiring a mutex. This is the implementation used when std::atomic<T>::is_lock_free()
returns true. This is generally more efficient than the mutex approach, but is only applicable for objects small enough to be manipulated "in one shot" by atomic instructions.
2 In some cases, the compiler is able to use plain instructions (rather than special concurrency related instructions) such as normal loads and stores if they offer the required guarantees on the platform in question.
For example, on x86, compilers implement all std::atomic
loads, for small-enough values with plain loads, and implement all stores weaker than memory_order_seq_cst
with plain stores. seq_cst
stores are implemented with special instructions, however - a trailing mfence
after mov
on GCC before 10.1, and (implicit lock
) xchg mem,reg
on clang, recent GCC, and other compilers.
Note also the asymmetry between loads and stores is a compiler choice: they could have put the special handling on seq_cst
loads instead, but because loads generally outnumber stores that's slower in most cases. (And because cheap loads in fast paths are more valuable.)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.