简体   繁体   English

获得浮点值平方根的最快方法

[英]Fastest way to get square root in float value

I am trying to find a fastest way to make square root of any float number in C++. I am using this type of function in a huge particles movement calculation like calculation distance between two particle, we need a square root etc. So If any suggestion it will be very helpful.我正在尝试找到一种最快的方法来计算 C++ 中任何浮点数的平方根。我正在使用这种类型的 function 进行巨大的粒子运动计算,例如计算两个粒子之间的距离,我们需要一个平方根等。所以如果有任何建议这将非常有帮助。 I have tried and below is my code我试过了,下面是我的代码

#include <math.h>
#include <iostream>
#include <chrono>

using namespace std;
using namespace std::chrono;

#define CHECK_RANGE 100

inline float msqrt(float a)
{
    int i;
    for (i = 0;i * i <= a;i++);
    
    float lb = i - 1; //lower bound
    if (lb * lb == a)
        return lb;
    float ub = lb + 1; // upper bound
    float pub = ub; // previous upper bound
    for (int j = 0;j <= 20;j++)
    {
        float ub2 = ub * ub;
        if (ub2 > a)
        {
            pub = ub;
            ub = (lb + ub) / 2; // mid value of lower and upper bound
        }
        else
        {
            lb = ub; 
            ub = pub;
        }
    }
    return ub;
}

void check_msqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i++)
    {
        msqrt(i);
    }
}

void check_sqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i++)
    {
        sqrt(i);
    }
}

int main()
{
    auto start1 = high_resolution_clock::now();
    check_msqrt();
    auto stop1 = high_resolution_clock::now();

    auto duration1 = duration_cast<microseconds>(stop1 - start1);
    cout << "Time for check_msqrt = " << duration1.count() << " micro secs\n";


    auto start2 = high_resolution_clock::now();
    check_sqrt();
    auto stop2 = high_resolution_clock::now();

    auto duration2 = duration_cast<microseconds>(stop2 - start2);
    cout << "Time for check_sqrt = " << duration2.count() << " micro secs";
    
    //cout << msqrt(3);

    return 0;
}

output of above code showing the implemented method 4 times more slow than sqrt of math.h file.上面代码的 output 显示实现的方法比 math.h 文件的 sqrt 慢 4 倍。 I need faster than math.h version.我需要比 math.h 更快的版本。 在此处输入图像描述

In short, I do not think it is possible to implement something generally faster than the standard library version of sqrt .简而言之,我认为不可能比sqrt的标准库版本更快地实现某些东西。

Performance is a very important parameter when implementing standard library functions and it is fair to assume that such a commonly used function as sqrt is optimized as much as possible.在实现标准库函数时,性能是一个非常重要的参数,可以公平地假设像sqrt这样常用的 function 是尽可能优化的。

Beating the standard library function would require a special case, such as:击败标准库 function 需要特殊情况,例如:

  • Availability of a suitable assembler instruction - or other specialized hardware support - on the particular system for which the standard library has not been specialized.在标准库尚未专门针对的特定系统上提供合适的汇编程序指令 - 或其他专门的硬件支持。
  • Knowledge of the needed range or precision.所需范围或精度的知识。 The standard library function must handle all cases specified by the standard.标准库 function 必须处理标准指定的所有情况。 If the application only needs a subset of that or maybe only requires an approximate result then perhaps an optimization is possible.如果应用程序只需要其中的一个子集,或者可能只需要一个近似结果,那么也许可以进行优化。
  • Making a mathematical reduction of the calculations or combine some calculation steps in a smart way so an efficient implementation can be made for that combination.对计算进行数学简化或以智能方式组合一些计算步骤,以便可以对该组合进行有效的实施。

Here's another alternative to binary search.这是二进制搜索的另一种选择。 It may not be as fast as std::sqrt , haven't tested it.它可能不如std::sqrt快,还没有测试过。 But it will definitely be faster than your binary search.但它肯定会比你的二进制搜索更快。

auto
Sqrt(float x)
{
    using F = decltype(x);
    if (x == 0 || x == INFINITY || isnan(x))
        return x;
    if (x < 0)
        return F{NAN};
    int e;
    x = std::frexp(x, &e);
    if (e % 2 != 0)
    {
        ++e;
        x /= 2;
    }
    auto y = (F{-160}/567*x + F{2'848}/2'835)*x + F{155}/567;
    y = (y + x/y)/2;
    y = (y + x/y)/2;
    return std::ldexp(y, e/2);
}

After getting +/-0, nan, inf, and negatives out of the way, it works by decomposing the float into a mantissa in the range of [ 1 / 4 , 1) times 2 e where e is an even integer. The answer is then sqrt(mantissa)* 2 e / 2 .在排除了 +/-0、nan、inf 和负数之后,它通过将float分解为 [ 1 / 4 , 1) 乘以 2 e范围内的尾数来工作,其中e是偶数 integer。答案然后是 sqrt(mantissa)* 2 e / 2

Finding the sqrt of the mantissa can be guessed at with a least squares quadratic curve fit in the range [ 1 / 4 , 1].可以使用范围 [ 1 / 4 , 1] 内的最小二乘二次曲线来猜测找到尾数的平方根。 Then that good guess is refined by two iterations of Newton–Raphson.然后这个好的猜测通过牛顿-拉夫森的两次迭代得到改进。 This will get you within 1 ulp of the correctly rounded result.这将使您在正确舍入结果的 1 ulp范围内。 A good std::sqrt will typically get that last bit correct.好的std::sqrt通常会使最后一位正确。

I have also tried with the algorithm mention in https://en.wikipedia.org/wiki/Fast_inverse_square_root , but not found desired result, please check我也尝试过https://en.wikipedia.org/wiki/Fast_inverse_square_root中提到的算法,但没有找到想要的结果,请检查

#include <math.h>
#include <iostream>
#include <chrono>

#include <bit>
#include <limits>
#include <cstdint>

using namespace std;
using namespace std::chrono;

#define CHECK_RANGE 10000

inline float msqrt(float a)
{
    int i;
    for (i = 0;i * i <= a;i++);
    
    float lb = i - 1; //lower bound
    if (lb * lb == a)
        return lb;
    float ub = lb + 1; // upper bound
    float pub = ub; // previous upper bound
    for (int j = 0;j <= 20;j++)
    {
        float ub2 = ub * ub;
        if (ub2 > a)
        {
            pub = ub;
            ub = (lb + ub) / 2; // mid value of lower and upper bound
        }
        else
        {
            lb = ub; 
            ub = pub;
        }
    }
    return ub;
}

/* mentioned here ->  https://en.wikipedia.org/wiki/Fast_inverse_square_root */
inline float Q_sqrt(float number)
{
    union Conv {
        float    f;
        uint32_t i;
    };
    Conv conv;
    conv.f= number;
    conv.i = 0x5f3759df - (conv.i >> 1);
    conv.f *= 1.5F - (number * 0.5F * conv.f * conv.f);
    return 1/conv.f;
}

void check_Qsqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i++)
    {
        Q_sqrt(i);
    }
}

void check_msqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i++)
    {
        msqrt(i);
    }
}

void check_sqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i++)
    {
        sqrt(i);
    }
}

int main()
{
    auto start1 = high_resolution_clock::now();
    check_msqrt();
    auto stop1 = high_resolution_clock::now();

    auto duration1 = duration_cast<microseconds>(stop1 - start1);
    cout << "Time for check_msqrt = " << duration1.count() << " micro secs\n";


    auto start2 = high_resolution_clock::now();
    check_sqrt();
    auto stop2 = high_resolution_clock::now();

    auto duration2 = duration_cast<microseconds>(stop2 - start2);
    cout << "Time for check_sqrt = " << duration2.count() << " micro secs\n";
    
    auto start3 = high_resolution_clock::now();
    check_Qsqrt();
    auto stop3 = high_resolution_clock::now();

    auto duration3 = duration_cast<microseconds>(stop3 - start3);
    cout << "Time for check_Qsqrt = " << duration3.count() << " micro secs\n";

    //cout << Q_sqrt(3);
    //cout << sqrt(3);
    //cout << msqrt(3);
    return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM