[英]Why is there such a massive difference in compile time between consteval/constexpr and template metafunctions?
I was curious how far I could push gcc as far as compile-time evaluation is concerned, so I made it compute the Ackermann function, specifically with input values of 4 and 1 (anything higher than that is impractical):我很好奇就编译时评估而言,我可以将 gcc 推多远,所以我让它计算了Ackermann函数,特别是输入值为 4 和 1(高于此值的任何值都是不切实际的):
consteval unsigned int A(unsigned int x, unsigned int y)
{
if(x == 0)
return y+1;
else if(y == 0)
return A(x-1, 1);
else
return A(x-1, A(x, y-1));
}
unsigned int result = A(4, 1);
(I think the recursion depth is bounded at ~16K but just to be safe I compiled this with -std=c++20 -fconstexpr-depth=100000 -fconstexpr-ops-limit=12800000000
) (我认为递归深度限制在 ~16K,但为了安全起见,我用
-std=c++20 -fconstexpr-depth=100000 -fconstexpr-ops-limit=12800000000
编译了这个)
Not surprisingly, this takes up an obscene amount of stack space (in fact, it causes the compiler to crash if run with the default process stack size of 8mb) and takes several minutes to compute.毫不奇怪,这占用了大量的堆栈空间(实际上,如果以 8mb 的默认进程堆栈大小运行,它会导致编译器崩溃)并且需要几分钟的时间来计算。 However, it does eventually get there so evidently the compiler could handle it.
但是,它最终确实到达了那里,因此显然编译器可以处理它。
After that I decided to try implementing the Ackermann function using templates, with metafunctions and partial specialization pattern matching.在那之后,我决定尝试使用模板、元函数和偏特化模式匹配来实现 Ackermann 函数。 Amazingly, the following implementation only takes a few seconds to evaluate:
令人惊讶的是,以下实现只需几秒钟即可评估:
template<unsigned int x, unsigned int y>
struct A {
static constexpr unsigned int value = A<x-1, A<x, y-1>::value>::value;
};
template<unsigned int y>
struct A<0, y> {
static constexpr unsigned int value = y+1;
};
template<unsigned int x>
struct A<x, 0> {
static constexpr unsigned int value = A<x-1, 1>::value;
};
unsigned int result = A<4,1>::value;
(compile with -ftemplate-depth=17000
) (使用
-ftemplate-depth=17000
编译)
Why is there such a dramatic difference in evaluation time?为什么评估时间会有如此巨大的差异? Aren't these essentially equivalent?
这些本质上不是等价的吗? I guess I can understand the
consteval
solution requiring slightly more memory and evaluation time because semantically it consists of a bunch of function calls, but that doesn't explain why this exact same (non-consteval) function computed at runtime only takes slightly longer than the metafunction version (compiled without optimizations).我想我可以理解需要更多内存和评估时间的
consteval
解决方案,因为在语义上它由一堆函数调用组成,但这并不能解释为什么在运行时计算的这个完全相同的(非 consteval)函数只需要比元函数版本(未经优化编译)。
Why is consteval
so slow?为什么
consteval
这么慢? I'm almost tempted to conclude that it's being evaluated by a GIMPLE interpreter or something like that.我几乎很想得出结论,它正在由 GIMPLE 解释器或类似的东西进行评估。 Also, how can the metafunction version be so fast?
还有,元函数版本怎么能这么快? It's actually not much slower than optimized machine-code.
它实际上并不比优化的机器代码慢多少。
In the template version of A
, when a particular specialization, say A<2,3>
, is instantiated, the compiler remembers this type, and never needs to instantiate it again.在
A
的模板版本中,当一个特定的特化,比如A<2,3>
被实例化时,编译器会记住这个类型,并且永远不需要再次实例化它。 This comes from the fact that types are unique, and each "call" to this meta-function is just computing a type.这是因为类型是唯一的,对这个元函数的每次“调用”只是计算一个类型。
The consteval
function version is not optimized to do this, and so A(2,3)
may be evaluated multiple times, depending on the control flow, resulting in the performance difference you observe. consteval
函数版本未针对此进行优化,因此可能会多次评估A(2,3)
,具体取决于控制流,从而导致您观察到的性能差异。 There's nothing stopping compilers from "caching" the results of function calls, but these optimizations likely just haven't been implemented yet.没有什么可以阻止编译器“缓存”函数调用的结果,但这些优化可能还没有实现。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.