是否存在非循环无符号 32 位 integer 平方根 function C

Question

我已经看到了产生平方根的浮点位黑客，如此处所示fast floating point square root ，但这种方法适用于浮点数。

是否有类似的方法可以找到没有 32 位无符号 integer 循环的 integer 平方根？ 我一直在搜寻 web 一个，但没有看到任何

（我的想法是纯二进制表示没有足够的信息来做到这一点，但由于它被限制为 32 位我会猜其他）

Answer 1

这个答案假设目标平台没有浮点支持，或者非常慢的浮点支持（可能通过仿真）。

正如评论中所指出的，计数前导零 (CLZ) 指令可用于提供通过浮点操作数的指数部分提供的快速 log ₂功能。 CLZ 也可以在不通过内部函数提供功能的平台上以合理的效率进行仿真，如下所示。

可以从查找表 (LUT) 中提取适合几位的初始近似值，就像在浮点情况下一样，它可以通过牛顿迭代进一步细化。 对于 32 位 integer 平方根，一到两次迭代通常就足够了。 下面的 ISO-C99 代码显示了工作示例性实施，包括详尽的测试。

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <math.h>

uint8_t clz (uint32_t a); // count leading zeros
uint32_t umul_16_16 (uint16_t a, uint16_t b); // 16x16 bit multiply
uint16_t udiv_32_16 (uint32_t x, uint16_t y); // 32/16 bit division

/* LUT for initial square root approximation */
static const uint16_t sqrt_tab[32] = 
{ 
    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
    0x85ff, 0x8cff, 0x94ff, 0x9aff, 0xa1ff, 0xa7ff, 0xadff, 0xb3ff,
    0xb9ff, 0xbeff, 0xc4ff, 0xc9ff, 0xceff, 0xd3ff, 0xd8ff, 0xdcff, 
    0xe1ff, 0xe6ff, 0xeaff, 0xeeff, 0xf3ff, 0xf7ff, 0xfbff, 0xffff
};

/* table lookup for initial guess followed by division-based Newton iteration */
uint16_t my_isqrt (uint32_t x)
{
    uint16_t q, lz, y, i, xh;

    if (x == 0) return x; // early out, code below can't handle zero

    // initial guess based on leading 5 bits of argument normalized to 2.30
    lz = clz (x);
    i = ((x << (lz & ~1)) >> 27);
    y = sqrt_tab[i] >> (lz >> 1);
    xh = x >> 16; // use for overflow check on divisions

    // first Newton iteration, guard against overflow in division
    q = 0xffff;
    if (xh < y) q = udiv_32_16 (x, y);
    y = (q + y) >> 1;

    if (lz < 10) {
        // second Newton iteration, guard against overflow in division
        q = 0xffff;
        if (xh < y) q = udiv_32_16 (x, y);
        y = (q + y) >> 1;
    }

    if (umul_16_16 (y, y) > x) y--; // adjust quotient if too large

    return y; // (uint16_t)sqrt((double)x)
}

static const uint8_t clz_tab[32] = 
{
    31, 22, 30, 21, 18, 10, 29,  2, 20, 17, 15, 13, 9,  6, 28, 1,
    23, 19, 11,  3, 16, 14,  7, 24, 12,  4,  8, 25, 5, 26, 27, 0
};

/* count leading zeros (for non-zero argument); a machine instruction on many architectures */
uint8_t clz (uint32_t a)
{
    a |= a >> 16;
    a |= a >> 8;
    a |= a >> 4;
    a |= a >> 2;
    a |= a >> 1;
    return clz_tab [0x07c4acdd * a >> 27];
}

/* 16x16->32 bit unsigned multiply; machine instruction on many architectures */
uint32_t umul_16_16 (uint16_t a, uint16_t b)
{
    return (uint32_t)a * b;
}

/* 32/16->16 bit division. Note: Will overflow if x[31:16] >= y */
uint16_t udiv_32_16 (uint32_t x, uint16_t y)
{
    uint16_t r = x / y;
    return r;
}

int main (void)
{
    uint32_t x;
    uint16_t res, ref;
    
    printf ("testing 32-bit integer square root\n");
    x = 0;
    do {
        ref = (uint16_t)sqrt((double)x);
        res = my_isqrt (x);
        if (res != ref) {
            printf ("error: x=%08x  res=%08x  ref=%08x\n", x, res, ref);
            printf ("exhaustive test FAILED\n");
            return EXIT_FAILURE;
        }
        x++;
    } while (x);
    printf ("exhaustive test PASSED\n");
    return EXIT_SUCCESS;
}

Answer 2

不，您需要在某处引入日志； 由于位表示中的日志，快速浮点平方根起作用。

最快的方法可能是 n -> floor(sqrt(n)) 的查找表。 您不会将所有值存储在表中，而只存储平方根更改的值。 使用二分查找以 log(n) 时间在表中查找结果。

Answer 3

这是@njuffa 给出的代码的一个变体，可能会引起人们的兴趣。 它本质上是无分支的，并且避免了初始猜测的查找表，尽管它确实做了三个可能昂贵的除法而不是两个。

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <math.h>

uint8_t clz (uint32_t a); // count leading zeros
uint32_t umul_16_16 (uint16_t a, uint16_t b); // 16x16 bit multiply
uint16_t udiv_32_16 (uint32_t x, uint16_t y); // 32/16 bit division

uint16_t my_isqrt (uint32_t x)
{
    uint16_t s, y;

    s = clz(x);                     // use clz(x | 1) if clz(0) is undefined
    x <<= s & 30u;                          // 2**30 <= x < 2**32 (or x == 0)
    y = 1u + (x >> 30);                     // 2 <= y <= 4
    y = (y << 1) + udiv_32_16(x >> 27, y);  // 8 <= y <= 16
    y = (y << 3) + udiv_32_16(x >> 21, y);  // 128 <= y <= 256
    y = (y << 7) + udiv_32_16(x >> 9, y);   // 32768 <= y < 65536
    return (y - (umul_16_16(y, y) > x)) >> (s >> 1);
}

一些注意事项：

clz 、 umul_16_16和udiv_32_16的定义与@njuffa 的帖子完全相同
它通过了与@njuffa 的答案相同的详尽测试
它旨在为x正数工作，但方便地恰好也为x = 0产生正确的结果。
如所写，它假定clz(0)是有效的并返回31或32 （这对于 @njuffa 给出的实现是正确的）。 某些体系结构可能会提供clz指令，而clz(0)会针对该指令给出未定义的行为。 在这种情况下，请改用clz(x | 1) 。
前两个除法实际上可以执行为 16 位 / 8 位（或 16 位 / 16 位）
虽然很容易证明倒数第二行的界限y <= 65536 ； 证明y < 65536 ，因此结果不会溢出uint16_t ，需要逐案分析

The algorithm is the same one that's used in Python's math.isqrt function, and described in the comments starting here: https://github.com/python/cpython/blob/0b58bac3e7877d722bdbd3c38913dba2cb212f13/Modules/mathmodule.c#L1577

算法描述，基于牛顿法，仔细控制误差项：

首先，我们将x缩放为 4 的幂，以便输入具有零或一个前导零。
y的初始值是x >> 28平方根的上限或下限
现在y << 2是x >> 24的平方根的近似值。 下一行应用牛顿方法的一个步骤来获得更好的近似值，这再次可以证明是x >> 24平方根的上限或下限。
再次相同，但对于y << 4和x << 16 。
再次相同，但对于y << 8和x 。 在这第三个也是最后一个牛顿步骤之后， y是x平方根的上限或下限。 此外，它永远不等于65536 （这个证明有点尴尬），所以它仍然适合uint16_t 。
在最后一行中，我们检查y是否大于x的真实平方根，如果是，则在移回之前进行调整以补偿x的原始缩放比例。

是否存在非循环无符号 32 位 integer 平方根 function C

问题描述

3 个解决方案

解决方案1
4 已采纳 2021-02-01 03:18:08

解决方案2
3 2021-02-01 01:33:04

解决方案3
0 2022-01-01 17:43:12

是否存在非循环无符号 32 位 integer 平方根 function C

问题描述

3 个解决方案

解决方案1 4 已采纳 2021-02-01 03:18:08

解决方案2 3 2021-02-01 01:33:04

解决方案3 0 2022-01-01 17:43:12

解决方案1
4 已采纳 2021-02-01 03:18:08

解决方案2
3 2021-02-01 01:33:04

解决方案3
0 2022-01-01 17:43:12