简体   繁体   中英

Why isn't [[carries_dependency]] the default in C++?

I know that memory_order_consume has been deprecated, but I'm trying to understand the logic that went into the original design and how [[carries_dependency]] and kill_dependency were supposed to work. For that, I would like a specific example of code that would break on an IBM PowerPC or DEC alpha or even a hypothetical architecture with a hypothetical compiler that fully implemented consume semantics in C++11 or C++14.

The best I can come up with is an example like this:

int v;
std::atomic<int*> ap;

void
thread_1()
{
  v = 1;
  ap.store(&v, std::memory_order_release);
}

int
f(int *p [[carries_dependency]])
{
  return v;
}

void
thread_2()
{
  int *p;
  while (!(p = ap.load(std::memory_order_consume)))
    ;
  int v2 = f(p);
  assert(*p == v2);
}

I understand that the assertion could fail in this code. However, is it the case that the assertion is not supposed to fail if you remove [[carries_dependency]] from f ? If so, why is that the case? After all, you requested a memory_order_consume , so why would you expect other accesses to v to reflect acquire semantics? If removing [[carries_dependency]] does not make the code correct, then what's an example where [[carries_dependency]] (or making [[carries_dependency]] the default for all variables) breaks otherwise correct code?

The only thing I can think is that maybe this has to do with register spills? If a function spills a register onto the stack and later re-loads it, this could break the dependency chain. So maybe [[carries_dependency]] makes things efficient in some cases (says no need to issue memory barrier in the caller before calling this function) but also requires the callee to issue a memory barrier before any register spills or calling another function, which could be less efficient in other cases? I'm grasping at straws here, though, so would still love to hear from someone who understands this stuff...

return v doesn't have a data dependency on int *p , so you'd need acquire not consume for ap.load(consume) / f(p) to synchronize with the release store.

If you'd used return *p then this would be sufficient thanks to dependency ordering, because that load would have a data dependency on the earlier load, no way for the CPU to generate the address earlier and thus load from v before the load from ap that saw the value you were waiting for.

Promoting dropping the dependency-ordering stuff effectively requires promoting consume to acquire by using a memory barrier before the function call.

DEC Alpha would always need a barrier even for consume to work, ie it had to promote consume to acquire because the ISA didn't guarantee dependency ordering by the hardware.

Some ISAs (mostly just x86) are so strongly ordered they don't need a barrier because every load is an acquire load, not reordered with other loads. Or at least giving the illusion of not being reordered; actual implementations speculatively load early but nuke the pipeline if mis-speculation is detected, ie where a cache line isn't still valid by the time the load was architecturally allowed to happen.

So x86 and Alpha would likely still work even with the [[carries-dependency]] version, because they're either too strong or too weak for mo_consume to be something the hardware can actually do (more cheaply than mo_acquire ).

For Alpha it would depend where the compiler put the barrier; it could put it after f(p) 's return v , only before the *p that actually depends on the consume load. Or it could just promote consume to acquire on the spot at the load, like compilers do now (since consume is deprecated after proving too hard to support in its current design.)


As for why ISO C++11 decided to promote consume results to effectively acquire when passing across function boundaries , that might have been a usability consideration. But also performance. Without that, compilers would lose the ability to do some optimizations on incoming function args.

eg int ready = foo.load(consume); / if(ready == 1) return non_atomic[ready-ready]; is required to make asm that has a data dependency on the consume-load result, unlike normal when the compiler would just optimize it to return *non_atomic .

(You might be familiar with x86 xor eax,eax being a good way to zero a register. In weakly-ordered ISAs that guarantee dependency ordering, like ARM, eor r0, r0,r0 is guaranteed not to break the dependency on the old value of r0 .)

Also constant-propagation in branches like if(ready == 1) should be possible; code inside that if can only run when ready has the constant value 1 . So even if we weren't cancelling it out, non_atomic[ready] can't be optimized to non_atomic[1]; .

If every incoming function arg potentially carried a dependency, compilers would not be able to do those usual optimizations on any function args or values derived from them.


Related re: what consume is about and/or its deprecation, and that it's still used in a hand-rolled way with volatile in Linux kernel code (eg RCU):

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM