简体   繁体   中英

How is std::atomic implemented

I'm studying the difference between mutex and atomic in C++11.

As my understanding, mutex is a kind of lock-mechanism, which is implemented based on the OS/kernel. For example, Linux offers a mechanism, which is futex . With the help of futex , we could implement mutex and semaphore . Furthermore, I've known that futex is implemented by the low-level atomic operation, such as CompareAndSet , CompareAndSwap .

For std::atomic , I've known that it is implemented based on the memory model, which is introduced by C++11. However, I don't know how the memory model is implemented at the low level. If it is also implemented by the atomic operation like CompareAndSet , what is the difference between std::atomic and mutex ?

In a word, if std::atomic::is_lock_free gives me a false , well, I'm gonna say that std::atomic is the same with mutex . But if it gives me a true , how is it implemented at low level?

If atomic operations are lock_free , they're probably implemented the same way the components of a mutex are implemented. After all, to lock a mutex, you really want some kind of atomic operation to ensure that one and only one thread locks the mutex.

The difference is, atomic operations that are lock free don't have a "locked" state. Let's compare two possible ways to do an atomic increment of a variable:

First, the mutex way. We lock a mutex, we read-increment-write the variable, then we unlock the mutex. If the thread gets interrupted during the read-increment-write, other threads that attempt to perform this same operation will block trying to lock the mutex. (See Where is the lock for a std::atomic? for how this works on some real implementations, for objects too large to be lock_free.)

Second, the atomic way. The CPU "locks" just the cache line holding the variable we want to modify for the duration of a single read-increment-write instruction. (This means the CPU delays responding to MESI requests to invalidate or share the cache line, keeping exclusive access so no other CPU can look at it. MESI cache coherency always requires exclusive ownership of a cache line before a core can modify it so this is cheap if we already owned the line). It is not possible for us to get interrupted during an instruction. Another thread that attempts to access this variable, at worst, has to wait for the cache coherency hardware to sort out who can modify the memory location.

So how do we lock a mutex? Likely we perform an atomic compare-and-swap. So light atomic operations are the primitives from which heavy mutex operations are assembled.

Of course this is all platform-specific. But this is what typical, modern platforms that you are likely to use do.

what is the difference between std::atomic and mutex

A mutex is a concurrency construct, independent from any user data, offering lock and unlock method allowing you to protect (enforce mutual exclusion within) a region of code. You can put whatever you want in that region.

std::atomic<T> is an adapter over a single instance of a type T , allowing atomic access on a per operation basis to that object.

A mutex is more general in the sense that one possible implementation of std::atomic is to protect all access to the underlying object with a mutex.

std::atomic exists mostly because of the other common implementation: using an atomic instruction 2 to execute the operation directly without requiring a mutex. This is the implementation used when std::atomic<T>::is_lock_free() returns true. This is generally more efficient than the mutex approach, but is only applicable for objects small enough to be manipulated "in one shot" by atomic instructions.


2 In some cases, the compiler is able to use plain instructions (rather than special concurrency related instructions) such as normal loads and stores if they offer the required guarantees on the platform in question.

For example, on x86, compilers implement all std::atomic loads, for small-enough values with plain loads, and implement all stores weaker than memory_order_seq_cst with plain stores. seq_cst stores are implemented with special instructions, however - a trailing mfence after mov on GCC before 10.1, and (implicit lock ) xchg mem,reg on clang, recent GCC, and other compilers.

Note also the asymmetry between loads and stores is a compiler choice: they could have put the special handling on seq_cst loads instead, but because loads generally outnumber stores that's slower in most cases. (And because cheap loads in fast paths are more valuable.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM