Java Memory Model and Concurrency

Question

Given the x86 total store order and the happens-before relationship in the Java Memory Model, we know the compiler doesn't guarantee the order of execution of the instructions. It can reorder as it sees fit, in order to improve performance. Given that, we have:

EAX , EBX are names of registers
[x] , [y] are memory locations
r1 and r2 are names of local variables
x , y are shared variables accessible to all threads. All variables are 32-bit integers.
No, it's NOT A HOMEWORK QUESTION .

So I have two sets of problems I'm trying to determine the possible outputs:

[x] == [y] == 0 // the address space of [x] and [y] are 0.

// Thread 1                         Thread 2
MOV [x] <- 1                        MOV [y] <- 1
MOV EAX <- [y]                      MOV EBX <- [x]

Which are the possible values for the registers EBX and EAX ?

int x = 0;
int y = 0;

// Thread 1                         Thread 2
x = 1;                              y = 1; 
r1 = y;                             r2 = x;

What the possible values for r1 and r2 ?

Answer 1

Writing a 32-bit integer is guaranteed to be atomic by the JVM , so this is not an issue.

You have 2 variables x and y shared between threads without synchronization .

Thread1 mutates x and reads y.
Thread2 mutates y and reads x.

Therefore, thread1 could see a stale value of y (1 or 0), and thread2 could see stale value x (1,0).

This means you can get all four possible combinations of (eax, ebx): (0,0) (0,1) (1,0) (1,1)

Answer 2

x86 has a strongly ordered memory model, but does still allow StoreLoad reordering .

Jeff Preshing's blog post: Memory Reordering Caught in the Act , uses exactly that pair of store-then-load sequences as a test case to prove that reordering really can be observed on real hardware. He has source code and everything.

Note that each thread has its own architectural state (including all the registers). So thread1's EAX is different from thread2's EAX. Using EBX in thread2 only makes it easier to talk about, not any different from a what-can-happen POV.

Anyway, both registers can end up with 0. This rarely happens, but it can, because each thread's store can be delayed (in a store buffer or whatever) until after the other thread's load has chosen a value. Having this be legal lets the CPU aggressively use prefetched data to satisfy loads, and to buffer stores so they may not become globally visible right away when they retire. ("retire" means the architectural state (including EIP) of the thread running the instruction has moved on to the next instruction, and the effects are committed.)

The other possibilities, once the dust settles, always include both globals being 1 . All 4 possible values of zero and one in each thread's register are possible, including both 1 . It's possible for them to see each other's stores. I'm not sure how likely this is; it might require one thread being interrupted after its store but before its load. If both threads are running on the same physical core (hyperthreading), this possibility is much more likely .

Even if the storage for x and y is unaligned and crosses a cache line, 0 and 1 are the only possible values. (C compiler output, and JVMs, will align variables to their natural alignment, making this a non-issue, but you can do anything you want in asm so I thought I'd mention it.) This happens because the two values differ only in the least significant byte.

If you were storing a 32bit -1 to 4 bytes that span two cache lines, the other thread could load a value of 0x00ffffff or 0xff000000 , 0x0000ffff or 0xffff0000 , etc. (depending on where the cache-line boundary was), as well as the usual 0 or 0xffffffff (aka -1 ).

re: Java. I haven't read up on the Java memory model. Other answers are saying it even allows compile-time reordering (like c++11's std::atomic rules ). Even if not, without a full memory barrier, StoreLoad reordering can happen. So all four results are possible .

This is true even if your JVM is running on an x86 CPU (rather than weakly-ordered hardware like ARM).

This answer to another question may shed some light on why LFENCE/SFENCE exist on x86, even though they are no-ops in most cases. (ie when not using movnt or weakly-ordered memory regions (like USWC video memory)).

Or, just read Jeff Preshing's other blog posts to learn more about memory ordering. I found it really helpful myself.

Answer 3

We can simply label statements as below:

A) [x] <- 1            C) [y] <- 1

B) EAX <- [y]           D) EBX <- [x]

We know that A comes before B, and C comes before D, so just insert C and D into AB in all of the possible permutations:

CDAB
CADB
CABD
ACDB
ACBD
ABCD

And consider the implications of each possibility, noting that the majority start with either AC or CA , outputting (EAX,EBX)=(1,1) since the assignments are happening before EAX and EBX are being set. All that's left is to check the other two possibilities. CDAB gives (EAX,EBX)=(1,0) , and ABCD gives (EAX,EBX)=(0,1) .

For the Java version, you state that the compiler does not guarantee the order of the statements executed. In that case, it shouldn't be difficult to order A , B , C , and D to get (0,0), (1,0), (0,1), and (1,1).

Java Memory Model and Concurrency

Question

3 answers

solution1
4 ACCPTED 2015-11-29 19:21:22

solution2
4 2015-11-29 20:43:49

solution3
2 2015-11-29 19:12:34

Java Memory Model and Concurrency

Question

3 answers

solution1 4 ACCPTED 2015-11-29 19:21:22

solution2 4 2015-11-29 20:43:49

solution3 2 2015-11-29 19:12:34

solution1
4 ACCPTED 2015-11-29 19:21:22

solution2
4 2015-11-29 20:43:49

solution3
2 2015-11-29 19:12:34