Is this understanding correct for these code about java volatile and reordering?

Question

According to this reorder rules

if I have code like this

volatile int a = 0;

boolean b = false;

foo1(){ a= 10; b = true;}

foo2(){if(b) {assert a==10;}}

Make Thread A to run foo1 and Thread b to run foo2, since a= 10 is a volatile store and b = true is a normal store, then these two statements could possible be reordered, which means in Thread B may have b = true while a!=10? Is that correct?

Added:

Thanks for your answers!
I am just starting to learn about java multi-threading and have been troubled with keyword volatile a lot.

Many tutorial talk about the visibility of volatile field, just like "volatile field becomes visible to all readers (other threads in particular) after a write operation completes on it". I have doubt about how could a completed write on field being invisible to other Threads(or CPUS)?

As my understanding, a completed write means you have successfully written the filed back to cache, and according to the MESI, all others thread should have an Invalid cache line if this filed have been cached by them. One exception ( Since I am not very familiar with the hardcore, this is just a conjecture )is that maybe the result will be written back to the register instead of cache and I do not know whether there is some protocol to keep consistency in this situation or the volatile make it not to write to register in java.

In some situation that looks like "invisible" happens examples:

    A=0,B=0; 
    thread1{A=1; B=2;}  
    thread2{if(B==2) {A may be 0 here}}

suppose the compiler did not reorder it, what makes we see in thread2 is due to the store buffer, and I do not think a write operation in store buffer means a completed write. Since the store buffer and invalidate queue strategy, which make the write on variable A looks like invisible but in fact the write operation has not finished while thread2 read A. Even we make field B volatile, while we set a write operation on field B to the store buffer with memory barriers, thread 2 can read the b value with 0 and finish. As for me, the volatile looks like is not about the visibility of the filed it declared, but more like an edge to make sure that all the writes happens before volatile field write in ThreadA is visible to all operations after volatile field read( volatile read happens after volatile field write in ThreadA has completed ) in another ThreadB.

By the way, since I am not an native speakers, I have seen may tutorials with my mother language(also some English tutorials) say that volatile will instruct JVM threads to read the value of volatile variable from main memory and do not cache it locally, and I do not think that is true. Am I right?

Anyway, Thanks for your answers, since not a native speakers, I hope I have made my expression clearly.

Answer 1

I'm pretty sure the assert can fire. I think a volatile load is only an acquire operation ( https://preshing.com/20120913/acquire-and-release-semantics/ ) wrt. non-volatile variables, so nothing is stopping load-load reordering.

Two volatile operations couldn't reorder with each other, but reordering with non-atomic operations is possible in one direction, and you picked the direction without guarantees.

(Caveat, I'm not a Java expert; it's possible but unlikely volatile has some semantics that require a more expensive implementation.)

More concrete reasoning is that if the assert can fire when translated into asm for some specific architecture, it must be allowed to fire by the Java memory model.

Java volatile is (AFAIK) equivalent to C++ std::atomic with the default memory_order_seq_cst . Thus foo2 can JIT-compile for ARM64 with a plain load for b and an LDAR acquire load for a .

ldar can't reorder with later loads/stores, but can with earlier. (Except for stlr release stores; ARM64 was specifically designed to make C++ std::atomic<> with memory_order_seq_cst / Java volatile efficient with ldar and stlr , not having to flush the store buffer immediately on seq_cst stores, only on seeing an LDAR, so that design gives the minimal amount of ordering necessary to still recover sequential consistency as specified by C++ (and I assume Java).)

On many other ISAs, sequential-consistency stores do need to wait for the store buffer to drain itself, so they are in practice ordered wrt. later non-atomic loads. And again on many ISAs, an acquire or SC load is done with a normal load preceded with a barrier which blocks loads from crossing it in either direction, otherwise they wouldn't work . That's why having the volatile load of a compile to an acquire-load instruction that just does an acquire operation is key to understanding how this can happen in practice.

(In x86 asm, all loads are acquire loads and all stores are release stores. Not sequential-release, though; x86's memory model is program order + store buffer with store-forwarding, which allows StoreLoad reordering, so Java volatile stores need special asm.

So the assert can't fire on x86, except via compile/JIT-time reordering of the assignments . This is a good example of one reason why testing lock-free code is hard: a failing test can prove there is a problem, but testing on some hardware/software combo can't prove correctness.)

Answer 2

In addition to Peter Cordes his great answer, in terms of the JMM there is a data race on b since there is no happens before edge between the write of b and the read of b because it is a plain variable. Only if this happens before edge would exist, then you are guaranteed that if load of b=1 that also the load of a=1 is seen.

Instead of making a volatile, you need to make b volatile.

int a=0;
volatile int b=0;

thread1(){
    a=1
    b=1
}

thread2(){
  if(b==1) assert a==1;
}

So if thread2 sees b=1, then this read is ordered before the write of b=1 in the happens before order (volatile variable rule). And since a=1 and b=1 are ordered happens before order (program order rule), and read of b and the read of a are ordered in the happens before order (program order rule again), then due to the transitive nature of the happens before relation, there is a happens before edge between the write of a=1 and the read of a; which needs to see the value 1.

You are referring to a possible implementation of the JMM using fences. And although it provides some insights into what happens under the hood, it is equally damaging to think in terms of fences because they are not a suitable mental model. See the following counter example:

https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#myth-barriers-are-sane

Answer 3

Yes, the assert can fail.

volatile int a = 0;

boolean b = false;

foo1(){ a= 10; b = true;}

foo2(){if(b) {assert a==10;}}

The JMM guarantees that writes to volatile fields happen-before reads from them. In your example, whatever thread a did before a = 10 will happen-before whatever thread b does after reading a (while executing assert a == 10 ). Since b = true executes after a = 10 for thread a (for a single thread, happens-before is always holds), there is no guarantee that there'll be an ordering guarantee. However, consider this:

int a = 0;

volatile boolean b = false;

foo1(){ a= 10; b = true;}

foo2(){if(b) {assert a==10;}}

In this example, the situation is:

a = 10 ---> b = true---|
                       |
                       | (happens-before due to volatile's semantics)
                       |
                       |---> if(b) ---> assert a == 10

Since you have a total order, the assert is guaranteed to pass.

Answer 4

Answer to your addition.

Many tutorial talk about the visibility of volatile field, just like "volatile field becomes visible to all readers (other threads in particular) after a write operation completes on it". I have doubt about how could a completed write on field being invisible to other Threads(or CPUS)?

The compiler might mess up code.

eg

boolean stop;

void run(){
  while(!stop)println();
}

first optimization

void run(){
   boolean r1=stop;
   while(!r1)println();
}

second optimization

void run(){
   boolean r1=stop;
   if(!r1)return;
   while(true) println();
}

So now it is obvious this loop will never stop because effectively the new value to stop will never been seen. For store you can do something similar that could indefinitely postpone it.

As my understanding, a completed write means you have successfully written the filed back to cache, and according to the MESI, all others thread should have an Invalid cache line if this filed have been cached by them.

Correct. This is normally called 'globally visible' or 'globally performed'.

One exception ( Since I am not very familiar with the hardcore, this is just a conjecture )is that maybe the result will be written back to the register instead of cache and I do not know whether there is some protocol to keep consistency in this situation or the volatile make it not to write to register in java.

All modern processors are load/store architectures (even X86 after uops conversion) meaning that there are explicit load and store instructions that transfer data between registers and memory and regular instructions like add/sub can only work with registers. So a register needs to be used anyway. The key part is that the compiler should respect the loads/stores of the source code and limit optimizations.

suppose the compiler did not reorder it, what makes we see in thread2 is due to the store buffer, and I do not think a write operation in store buffer means a completed write. Since the store buffer and invalidate queue strategy, which make the write on variable A looks like invisible but in fact the write operation has not finished while thread2 read A.

On the X86 the order of the stores in the store buffer are consistent with program order and will commit to the cache in program order. But there are architectures where stores from the store buffer can commit to the cache out of order eg due to:

write coalescing
allowing stores to commit to cache as soon as the cache line is returned in the right state no matter if an earlier still is still waiting.
sharing the store buffer with a subset of the CPUs.

Store buffers can be a source of reordering; but also out of order and speculative execution can be a source.

Apart from the stores, reordering loads can also lead to observing stores out of order. On the X86 loads can't be reordered, but on the ARM it is allowed. And of course the JIT can mess things up as well.

Even we make field B volatile, while we set a write operation on field B to the store buffer with memory barriers, thread 2 can read the b value with 0 and finish.

It is important to realize that the JMM is based on sequential consistency; so even though it is a relaxed memory model (separation of plain loads and stores vs synchronization actions like volatile load/store lock/unlock) if a program has no data races, it will only produce sequential consistent executions. For sequential consistency the real time order doesn't need to be respected. So it is perfectly fine for a load/store to be skewed as long as:

there memory order is a total order over all loads/stores
the memory order is consistent with the program order
a load sees the most recent write before it in the memory order.

As for me, the volatile looks like is not about the visibility of the filed it declared, but more like an edge to make sure that all the writes happens before volatile field write in ThreadA is visible to all operations after volatile field read( volatile read happens after volatile field write in ThreadA has completed ) in another ThreadB.

You are on the right path.

Example.

int a=0
volatile int b=;

thread1(){
   1:a=1
   2:b=1
}

thread2(){
   3:r1=b
   4:r2=a
}

In this case there is a happens before edge between 1-2 (program order). If r1=1, then there is happens before edge between 2-3 (volatile variable) and a happens before edge between 3-4 (program order).

Because the happens before relation is transitive, there is a happens before edge between 1-4. So r2 must be 1.

volatile takes care of the following:

Visibility: needs to make sure the load/store doesn't get optimized out.
That is load/store is atomic. So a load/store should not be seen partially.
And most importantly, it needs to make sure that the order between 1-2 and 3-4 is preserved.

By the way, since I am not an native speakers, I have seen may tutorials with my mother language(also some English tutorials) say that volatile will instruct JVM threads to read the value of volatile variable from main memory and do not cache it locally, and I do not think that is true.

You are completely right. This is a very common misconception. Caches are the source of truth since they are always coherent. If every write needs to go to main memory, programs would become extremely slow. Memory is just a spill bucket for whatever doesn't fit in cache and can be completely incoherent with the cache. Plain/volatile loads/stores are stored in the cache. It is possible to bypass the cache for special situations like MMIO or when using eg SIMD instructions but it isn't relevant for these examples.

Anyway, Thanks for your answers, since not a native speakers, I hope I have made my expression clearly.

Most people here are not a native speaker (I'm certainly not). Your English is good enough and you show a lot of promise.

Is this understanding correct for these code about java volatile and reordering?

Question

4 answers

solution1
3 2021-10-14 13:37:54

solution2
1 2021-10-14 17:39:14

solution3
1 2021-10-14 19:04:12

solution4
1 2021-10-15 14:21:09

Is this understanding correct for these code about java volatile and reordering?

Question

4 answers

solution1 3 2021-10-14 13:37:54

solution2 1 2021-10-14 17:39:14

solution3 1 2021-10-14 19:04:12

solution4 1 2021-10-15 14:21:09

solution1
3 2021-10-14 13:37:54

solution2
1 2021-10-14 17:39:14

solution3
1 2021-10-14 19:04:12

solution4
1 2021-10-15 14:21:09