output 10 with memory_order_seq_cst

Question

When i run this program i get output as 10 which seems to be impossible for me. I'm running this on x86_64 core i3 ubuntu.

If the output is 10, then 1 must have come from either c or d.

Also in thread t[0], we assign c as 1. Now a is 1 since it occurs before c=1. c is equal to b which was set to 1 by thread 1. So when we store d it should be 1 as a=1.

Can output 10 happen with memory_order_seq_cst? I tried inserting a atomic_thread_fence(seq_cst) on both thread between 1st (variable =1 ) and 2nd line (printf) but it still didn't work.

Uncommenting both the fence doesn't work. Tried running with g++ and clang++ . Both give the same result.

#include<thread>
#include<unistd.h>
#include<cstdio>
#include<atomic>
using namespace std;

atomic<int> a,b,c,d;

void foo(){
        a.store(1,memory_order_seq_cst);
//        atomic_thread_fence(memory_order_seq_cst);
        c.store(b,memory_order_seq_cst);
}

void bar(){
        b.store(1,memory_order_seq_cst);
  //      atomic_thread_fence(memory_order_seq_cst);
        d.store(a,memory_order_seq_cst);
}

int main(){
        thread t[2];
        t[0]=thread(foo); t[1]=thread(bar);
        t[0].join();t[1].join();
        printf("%d%d\n",c.load(memory_order_seq_cst),d.load(memory_order_seq_cst));
}

bash$ while [ true ]; do ./a.out | grep "10" ; done 
10
10
10
10

Answer 1

10 (c=1, d=0) is easily explained: bar happened to run first, and finished before foo read b .

Quirks of inter-core communication to get threads started on different cores means it's easily possible for this to happen even though thread(foo) ran first in the main thread. eg maybe an interrupt arrived at the core the OS chose for foo , delaying it from actually getting into that code ¹ .

Remember that seq_cst only guarantees that some total order exists for all seq_cst operations which is compatible with the sequenced-before order within each thread . (And any other happens-before relationship established by other factors). So the following order of atomic operations is possible without even breaking out the a.load ² in bar separately from the d.store of the resulting int temporary.

        b.store(1,memory_order_seq_cst);   // bar1.  b=1
        d.store(a,memory_order_seq_cst);   // bar2.  a.load reads 0, d=0

        a.store(1,memory_order_seq_cst);   // foo1
        c.store(b,memory_order_seq_cst);   // foo2.  b.load reads 1, c=1
// final: c=1, d=0

atomic_thread_fence(seq_cst) has no impact anywhere because all your operations are already seq_cst . A fence basically just stops reordering of this thread's operations; it doesn't wait for or sync with fences in other threads.

(Only a load that sees a value stored by another thread can create synchronization. But such a load doesn't wait for the other store; it has no way of knowing there is another store. If you want to keep loading until you see the value you expect, you have to write a spin-wait loop.)

Footnote 1: Since all your atomic vars are probably in the same cache line, even if execution did reach the top of foo and bar at the same time on two different cores, false-sharing is likely going to let both operations from one thread happen while the other core is still waiting to get exclusive ownership. Although seq_cst stores are slow enough (on x86 at least) that hardware fairness stuff might relinquish exclusive ownership after committing the first store of 1 . Anyway, lots of ways for both operations in one thread to happen before the other thread and get 10 or 01. Even possible to get 11 if we get b=1 then a=1 before either load. Using seq_cst does stop the hardware from doing the load early (before the store is globally visible), so it's very possible.

Footnote 2: The lvalue-to-rvalue evaluation of bare a uses the overloaded (int) conversion which is equivalent to a.load(seq_cst) . The operations from foo could happen between that load and the d.store that gets a temporary value from it. d.store(a) is not an atomic copy; it's equivalent to int tmp = a; d.store(tmp); . That isn't necessary to explain your observations.

Answer 2

The printf statements are unsynchronized so output of 10 can be just a reordered 01.
01 happens when the functions before the printf run serially.

output 10 with memory_order_seq_cst

Question

2 answers

solution1
2 ACCPTED 2021-03-28 14:56:18

solution2
0 2021-03-28 10:15:46

output 10 with memory_order_seq_cst

Question

2 answers

solution1 2 ACCPTED 2021-03-28 14:56:18

solution2 0 2021-03-28 10:15:46

solution1
2 ACCPTED 2021-03-28 14:56:18

solution2
0 2021-03-28 10:15:46