简体   繁体   English

为什么 [[carries_dependency]] 不是 C++ 中的默认值?

[英]Why isn't [[carries_dependency]] the default in C++?

I know that memory_order_consume has been deprecated, but I'm trying to understand the logic that went into the original design and how [[carries_dependency]] and kill_dependency were supposed to work.我知道memory_order_consume已被弃用,但我试图了解原始设计中的逻辑以及[[carries_dependency]]kill_dependency应该如何工作。 For that, I would like a specific example of code that would break on an IBM PowerPC or DEC alpha or even a hypothetical architecture with a hypothetical compiler that fully implemented consume semantics in C++11 or C++14.为此,我想要一个特定的代码示例,它会在 IBM PowerPC 或 DEC alpha 或什至具有假设编译器的假设架构上中断,该编译器在 C++11 或 C++14 中完全实现了消费语义。

The best I can come up with is an example like this:我能想到的最好的例子是这样的:

int v;
std::atomic<int*> ap;

void
thread_1()
{
  v = 1;
  ap.store(&v, std::memory_order_release);
}

int
f(int *p [[carries_dependency]])
{
  return v;
}

void
thread_2()
{
  int *p;
  while (!(p = ap.load(std::memory_order_consume)))
    ;
  int v2 = f(p);
  assert(*p == v2);
}

I understand that the assertion could fail in this code.我了解此代码中的断言可能会失败。 However, is it the case that the assertion is not supposed to fail if you remove [[carries_dependency]] from f ?但是,如果从f中删除[[carries_dependency]] ,断言应该失败吗? If so, why is that the case?如果是这样,为什么会这样? After all, you requested a memory_order_consume , so why would you expect other accesses to v to reflect acquire semantics?毕竟,您请求了memory_order_consume ,那么您为什么希望其他对v的访问能够反映获取语义呢? If removing [[carries_dependency]] does not make the code correct, then what's an example where [[carries_dependency]] (or making [[carries_dependency]] the default for all variables) breaks otherwise correct code?如果删除[[carries_dependency]]不会使代码正确,那么[[carries_dependency]] (或使[[carries_dependency]]成为所有变量的默认值)破坏其他正确代码的示例是什么?

The only thing I can think is that maybe this has to do with register spills?我唯一能想到的是,这可能与寄存器溢出有关? If a function spills a register onto the stack and later re-loads it, this could break the dependency chain.如果 function 将寄存器溢出到堆栈上,然后重新加载它,这可能会破坏依赖链。 So maybe [[carries_dependency]] makes things efficient in some cases (says no need to issue memory barrier in the caller before calling this function) but also requires the callee to issue a memory barrier before any register spills or calling another function, which could be less efficient in other cases? So maybe [[carries_dependency]] makes things efficient in some cases (says no need to issue memory barrier in the caller before calling this function) but also requires the callee to issue a memory barrier before any register spills or calling another function, which could在其他情况下效率会降低吗? I'm grasping at straws here, though, so would still love to hear from someone who understands this stuff...不过,我在这里抓住了稻草,所以仍然很想听听懂这些东西的人的来信……

return v doesn't have a data dependency on int *p , so you'd need acquire not consume for ap.load(consume) / f(p) to synchronize with the release store. return vint *p没有数据依赖性,因此您需要acquireconsume ap.load(consume) / f(p)才能与发布存储同步。

If you'd used return *p then this would be sufficient thanks to dependency ordering, because that load would have a data dependency on the earlier load, no way for the CPU to generate the address earlier and thus load from v before the load from ap that saw the value you were waiting for.如果您使用了return *p那么这将是足够的,这要归功于依赖排序,因为该加载将对较早的加载具有数据依赖关系,CPU 无法更早地生成地址,因此在加载 from 之前从v加载ap看到你正在等待的价值。

Promoting dropping the dependency-ordering stuff effectively requires promoting consume to acquire by using a memory barrier before the function call.有效地促进删除依赖排序的东西需要在 function 调用之前使用 memory 屏障来促进consume acquire

DEC Alpha would always need a barrier even for consume to work, ie it had to promote consume to acquire because the ISA didn't guarantee dependency ordering by the hardware. DEC Alpha 即使consume工作也总是需要一个屏障,即它必须促进consume才能acquire ,因为 ISA 不保证硬件的依赖排序。

Some ISAs (mostly just x86) are so strongly ordered they don't need a barrier because every load is an acquire load, not reordered with other loads.一些 ISA(大多数只是 x86)的排序非常强,它们不需要屏障,因为每个负载都是获取负载,而不是与其他负载重新排序。 Or at least giving the illusion of not being reordered;或者至少给人一种没有被重新排序的错觉; actual implementations speculatively load early but nuke the pipeline if mis-speculation is detected, ie where a cache line isn't still valid by the time the load was architecturally allowed to happen.实际实现推测性地提前加载,但如果检测到错误推测,即在架构上允许加载发生时缓存行仍然无效的情况下,会破坏管道。

So x86 and Alpha would likely still work even with the [[carries-dependency]] version, because they're either too strong or too weak for mo_consume to be something the hardware can actually do (more cheaply than mo_acquire ).所以 x86 和 Alpha 可能仍然可以使用[[carries-dependency]]版本,因为它们要么太强要么太弱, mo_consume不是硬件可以实际做的事情(比mo_acquire更便宜)。

For Alpha it would depend where the compiler put the barrier;对于 Alpha,这将取决于编译器将障碍放在哪里; it could put it after f(p) 's return v , only before the *p that actually depends on the consume load.它可以将它放在f(p)return v之后,仅在实际取决于消耗负载的*p之前。 Or it could just promote consume to acquire on the spot at the load, like compilers do now (since consume is deprecated after proving too hard to support in its current design.)或者它可以像编译器现在所做的那样,在加载时促进消耗在现场获取(因为在证明其当前设计难以支持后,消耗已被弃用。)


As for why ISO C++11 decided to promote consume results to effectively acquire when passing across function boundaries , that might have been a usability consideration.至于为什么 ISO C++11 决定在通过 function 边界时促进consume结果有效acquire ,这可能是一个可用性考虑。 But also performance.但也有表现。 Without that, compilers would lose the ability to do some optimizations on incoming function args.否则,编译器将失去对传入的 function 参数进行一些优化的能力。

eg int ready = foo.load(consume);例如int ready = foo.load(consume); / if(ready == 1) return non_atomic[ready-ready]; / if(ready == 1) return non_atomic[ready-ready]; is required to make asm that has a data dependency on the consume-load result, unlike normal when the compiler would just optimize it to return *non_atomic .需要生成对消耗加载结果具有数据依赖性的 asm,这与编译器仅优化它以return *non_atomic时不同。

(You might be familiar with x86 xor eax,eax being a good way to zero a register. In weakly-ordered ISAs that guarantee dependency ordering, like ARM, eor r0, r0,r0 is guaranteed not to break the dependency on the old value of r0 .) (您可能熟悉 x86 xor eax,eax是一种将寄存器归零的好方法。在保证依赖排序的弱排序 ISA 中,例如 ARM, eor r0, r0,r0保证不会破坏对旧值的依赖r0的。)

Also constant-propagation in branches like if(ready == 1) should be possible;if(ready == 1)这样的分支中的常量传播也应该是可能的; code inside that if can only run when ready has the constant value 1 .里面的代码if只能在ready具有常量值1时运行。 So even if we weren't cancelling it out, non_atomic[ready] can't be optimized to non_atomic[1];所以即使我们不取消它, non_atomic[ready]也不能优化为non_atomic[1]; . .

If every incoming function arg potentially carried a dependency, compilers would not be able to do those usual optimizations on any function args or values derived from them.如果每个传入的 function arg 都可能带有依赖项,则编译器将无法对任何 function args 或从它们派生的值进行那些通常的优化。


Related re: what consume is about and/or its deprecation, and that it's still used in a hand-rolled way with volatile in Linux kernel code (eg RCU):相关回复: consume是关于和/或其弃用的,并且它仍然以手动方式使用,在 Linux kernel 代码(例如 RCU)中具有volatile

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM