简体   繁体   中英

Why can `asm volatile("" ::: "memory")` serve as a compiler barrier?

It is known that asm volatile (""::: "memory") can serve as a compiler barrier to prevent compiler from reordering assembly instructions across it. For example, it is mentioned in https://preshing.com/20120625/memory-ordering-at-compile-time/ , section "Explicit Compiler Barriers".

However, all the articles I can find only mention the fact that asm volatile (""::: "memory") can serve as a compiler barrier without giving a reason why the "memory" clobber can effectively form a compiler barrier. The GCC online documentation only says that all the special clobber "memory" does is tell the compiler that the assembly code may potentially perform memory reads or writes other than those specified in operands lists. But how does such a semantic cause compiler to stop any attempt to reorder memory instructions across it? I tried to answer myself but failed, so I ask here: why can asm volatile (""::: "memory") serve as a compiler barrier, based on the semantics of "memory" clobber? Please note that I am asking about "compiler barrier" (in effect at compile-time), not stronger "memory barrier" (in effect at run-time). For convenience, I excerpt the semantics of "memory" clobber in GCC online doc below:

The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm . Further, the compiler does not assume that any values read from memory before an asm remain unchanged after that asm ; it reloads them as needed. Using the "memory" clobber effectively forms a read/write memory barrier for the compiler.

If a variable is potentially read or written, it matters what order that happens in. The point of a "memory" clobber is to make sure the reads and/or writes in an asm statement happen at the right point in the program's execution.

Any read of a C variable's value that happens in the source after an asm statement must be after the memory-clobbering asm statement in the compiler-generated assembly output for the target machine, otherwise it might be reading a value before the asm statement would have changed it.

Any read of a C var in the source before an asm statement similarly must stay sequenced before, otherwise it might incorrectly read a modified value.

Similar reasoning applies to assignments to (writes of) C variables before/after any asm statement with a "memory" clobber. Just like a function call to an "opaque" function, one who's definition the compiler can't see.

No reads or writes can reorder with the barrier in either direction, therefore no operation before the barrier can reorder with any operation after the barrier, or vice versa.


Another way to look at it: the actual machine memory contents must match the C abstract machine at that point. The compiler-generated asm has to respect that, by storing any variable values from registers to memory before the start of an asm("":::"memory") statement, and afterwards it has to assume that any registers that had copies of variable values might not be up to date anymore. So they have to be reloaded if they're needed.

This reads-everything / writes-everything assumption for the "memory" clobber is what keeps the asm statement from reordering at all at compile time wrt. all accesses, even non- volatile ones. The volatile is already implicit from being an asm() statement with no "=..." output operands, and is what stops it from being optimized away entirely (and with it the memory clobber).


Note that only potentially "reachable" C variables are affected. For example, escape analysis can still let the compiler keep a local int i in a register across a "memory" clobber, as long as the asm statement itself doesn't have the address as an input.

Just like a function call: for (int i=0;i<10;i++) {foobar("%d\n", i);} can keep the loop counter in a register, and just copy it to the 2nd arg-passing register for foobar every iteration. There's no way foobar can have a reference to i because its address hasn't been stored anywhere or passed anywhere.

(This is fine for the memory barrier use-case; no other thread could have its address either.)


Related:

I'll add that : memory is only a compiler directive. A speculative processor may reorder instructions. To prevent this an explicit memory barrier call is necessary. See Linux doc on memory barriers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM