简体   繁体   English

GCC 不尾调用优化递归函数

[英]GCC doesn't tail call optimize recursive function

I have written a hexadecimal-parsing function for my uint128 structure, which is internally just two uint64_t's - hi and lo.我已经为我的 uint128 结构编写了一个十六进制解析函数,它内部只有两个 uint64_t - hi 和 lo。 Here's the function in question:这是有问题的函数:

uint128_t parse_from_hex(const char *string, uint128_t previous_value, const size_t num_digits) {
    if (string == NULL || string[0] == '\0')
        return previous_value;

    if (num_digits + 1 > 2 * SIZEOF_INT128) { // SIZEOF_INT128 is 16 - meaning 16 bytes
        return UINT128_MAX; // the maximum value of 128bit uint, if we overflow
    }

    int64_t current_digit = parse_hex_digit(string[0]);
    if (current_digit < 0)
        return UINT128_ZERO; // a global variable which I use multiple times which represents a 0
    return parse_from_hex(string + 1,
                          uint128_or_uint64(uint128_shift_left(previous_value, 4), (uint64_t)current_digit),
                          num_digits + 1);
}

For some reason, gcc does not optimize the function even though the recursive call is clearly made a single time at the end of the function.出于某种原因,即使递归调用在函数末尾明确进行了一次,gcc 也不会优化该函数。 The other functions used in the parsing function don't have any side effects and return a new value, so I do not think that the problem is with them.解析函数中使用的其他函数没有任何副作用并返回一个新值,所以我不认为问题出在它们身上。 I have tried making the uint128_t struct members non-const (originally they were non-const) as well as the function arguments non-const, but that didn't help either.我曾尝试使 uint128_t 结构成员非常量(最初它们是非常量)以及函数参数非常量,但这也无济于事。 Originally compiled with Ofast, but also tried with O3 and O2 - no luck.最初用Ofast编译,但也尝试过O3和O2 - 没有运气。 Could anyone who knows better on the subject please help?任何对这个问题有更好了解的人都可以帮忙吗? I thought I understood it quite well but clearly I'm missing something.我以为我理解得很好,但显然我错过了一些东西。

As it has been pointed out by @BillLynch in the comments - it's clang which doesn't optimize the function for some reason, not GCC.正如@BillLynch 在评论中指出的那样 - 这是由于某种原因没有优化功能的 clang,而不是 GCC。 On my PC GCC 10.2.0 optimizes the function properly, so there's no problem here.在我的PC上GCC 10.2.0优化了功能,所以这里没有问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM