Is using std::atomic_thread_fence right before an atomic load/store with the same order always redundant?

Question

Given:

std::atomic<uint64_t> b;

void f()
{
    std::atomic_thread_fence(std::memory_order::memory_order_acquire);

    uint64_t a = b.load(std::memory_order::memory_order_acquire);

    // code using a...
}

Can removing the call to std::atomic_thread_fence have any effect? If so is there a succinct example? Keeping in mind that other functions may store/load to b and call f .

Answer 1

Never redundant. atomic_thread_fence actually has stricter ordering requirements than a load with mo_acquire . It's poorly documented, but the acquire fence isn't one-way permiable for loads; it preserves Read-Read and Read-Write order between accesses on opposite sides of the fence.

Load-acquires on the other hand only require ordering between that load and subsequent loads and stores. Read-Read and Read-Write order is enforced ONLY between that particular load-acquire. Prior loads/stores (in program order) have no restrictions. Thus the load-acquire is one-way permiable.

The release fence similarly loses one-way permiability for stores, preserving Write-Read and Write-Write. See Jeff Preshing's article https://preshing.com/20130922/acquire-and-release-fences/ .

By the way, it looks like you have your fence on the wrong side. See Preshing's other article https://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect/ . With an acquire-load, the load happens before the acquire, so using fences it would look like this:

uint64_t a = b.load(std::memory_order::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order::memory_order_acquire);

Remember that release doesn't guarantee visibility. All release does is guarantee the order in which writes to different variables become visible in other threads. (Without this, other threads can observe orderings that seem to violate cause-and-effect.)

Here's an example using CppMem tool ( http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/ ). The first thread is SC, so we know the writes occur in that order. CppMem gives "8 executions; 1 consistent, race free", indicating that it is possible for the 2nd thread to see b==1 && a==0. This is because b.load is allowed to be reordered after a.load .

int main() {
  atomic_int a = 0;
  atomic_int b = 0;

  {{{ {
    a.store(1, mo_seq_cst);
    b.store(1, mo_seq_cst);
  } ||| {
    b.load(mo_relaxed).readsvalue(1);
    a.load(mo_acquire).readsvalue(0);
  } }}}
}

If we replace the acquire-load with an aquire-fence, b.load is not allowed to be reordered after a.load . CppMem gives "8 executions; no consistent" confirming that it is not possible.

int main() {
  atomic_int a = 0;
  atomic_int b = 0;

  {{{ {
    a.store(1, mo_seq_cst);
    b.store(1, mo_seq_cst);
  } ||| {
    b.load(mo_relaxed).readsvalue(1);
    atomic_thread_fence(mo_acquire);
    a.load(mo_relaxed).readsvalue(0);
  } }}}
}

Is using std::atomic_thread_fence right before an atomic load/store with the same order always redundant?

Question

1 answers

solution1
0 2021-11-01 10:19:17

Is using std::atomic_thread_fence right before an atomic load/store with the same order always redundant?

Question

1 answers

solution1 0 2021-11-01 10:19:17

solution1
0 2021-11-01 10:19:17