Thread-local acquire/release synchronization

Question

In general, load-acquire/store-release synchronization is one of the most common forms of memory-ordering based synchronization in the C++11 memory model. It's basically how a mutex provides memory ordering. The "critical section" between a load-acquire and a store-release is always synchronized among different observer threads, in the sense that all observer threads will agree on what happens after the acquire and before the release.

Generally, this is achieved with a read-modify-write instruction, like compare-exchange, along with an acquire barrier, when entering the critical section, and another read-modify-write instruction with a release barrier when exiting the critical section.

But there are some situations where you might have a similar critical section [1] between a load-acquire and a release-store, except only one thread actually modifies the synchronization variable. Other threads may read the synchronization variable, but only one thread actually modifies it. In this case, when entering the critical section, you don't need a read-modify-write instruction. You would just need a simple store, since you are not racing with other threads that are attempting to modify the synchronization flag. (This may seem odd, but note that many lock-free memory reclamation deferral patterns, like user-space RCU or epoch based reclamation, use thread-local synchronization variables that are written to only by one thread, but read by many threads, so this isn't too weird of a situation.)

So, when entering the critical section, you could just do something like:

sync_var.store(true, ...);

.... critical section ....

sync_var.store(false, std::memory_order_release);

There is no race, because, again, there is no need for a read-modify-write when only one thread needs to set/unset the critical section variable. Other threads can simply read the critical section variable with a load-acquire.

The problem is, when you're entering the critical section, you need an acquire operation or fence. But you don't need to do a LOAD, only a STORE. So what is a good way to produce acquire ordering when you only really need a STORE? I see only two real options that fall within the C++ memory model. Either:

Use an exchange instead of a store, so you can do sync_var.exchange(true, std::memory_order_acquire) . The downside here is that exchange is a more heavy-weight read-modify-write operation, when all you really need is a simple store.
Insert a "dummy" load-acquire, like:
(void)sync_var.load(std::memory_order_acquire); sync_var.store(true, std::memory_order_relaxed);

The "dummy" load-acquire seems better. Presumably, the compiler can't optimize away the unused load, because it's an atomic instruction that has the side-effect of producing a "synchronizes-with" relationship with a release operation on sync_var . But it also seems very hacky, and the intention is unclear without comments explaining what's going on.

So what is the best way to produce acquire semantics when all we need to do is a simple store?

[1] I use the term "critical section" loosely. I don't necessarily mean a section that is always accessed via mutual exclusion. Rather, I just mean any section where memory ordering is synchronized via acquire-release semantics. This could refer to a mutex, or it could just mean something like RCU, where the critical section can be accessed concurrently by multiple readers.

Answer 1

The flaw in your logic is that an atomic RMW is not required because data in the critical section is modified by a single thread while all other threads only have read-access.
This is not true; there still needs to be a well-defined order between reading and writing. You don't want data to be modified while another thread is still reading it. Therefore, each thread needs to inform other threads when it has finished accessing the data.

By only using an atomic store to enter the critical section, the 'synchronizes-with' relationship cannot be established. Acquire/release synchronization is based on a runtime relationship where the acquirer knows that synchronization is complete only after observing a particular value returned by the atomic load. This can never be achieved by a single atomic store since the one modifying thread can change the atomic variable sync_var at any time and as such it has no way knowing whether another thread is still reading the data.

The option with a 'dummy' load/acquire is also invalid because it fails to inform other threads that it wants exclusive access. You attempt to solve that by using a single (relaxed) store, but the load and the store are separate operations that can be interrupted by other threads (ie multiple threads simultaneously accessing the critical area).

An atomic RMW must be used by each thread to load a particular value and at the same time update the variable to inform all other threads it has now exclusive access (regardless whether that is for reading or writing).

void lock()
{
    while (sync_var.exchange(true, std::memory_order_acquire));
}

void unlock()
{
    sync_var.store(false, std::memory_order_release);
}

Optimizations are possible where multiple threads have read-access at the same time (eg. std::shared_mutex ).

Thread-local acquire/release synchronization

Question

1 answers

solution1
3 2018-04-03 20:25:25

Thread-local acquire/release synchronization

Question

1 answers

solution1 3 2018-04-03 20:25:25

solution1
3 2018-04-03 20:25:25