仅使用单精度浮点数逼近 [0,pi] 上的余弦

Question

i'm currently working on an approximation of the cosine.我目前正在研究余弦的近似值。 Since the final target device is a self-developement working with 32 bit floating point ALU / LU and there is a specialized compiler for C, I am not able to use the c library math functions (cosf,...).由于最终目标设备是使用 32 位浮点 ALU / LU 的自行开发，并且有专门的 C 编译器，因此我无法使用 C 库数学函数（cosf，...）。 I'm aiming to code various methods that differ in terms of accuracy and number of instructions / cycles.我的目标是编写在准确性和指令/周期数量方面不同的各种方法。

I've already tried a lot of different approximation algorithms, starting from fdlibm, taylor expansion, pade approximation, remez algorithm using maple and so on....我已经尝试了很多不同的近似算法，从 fdlibm、泰勒展开、pade 近似、使用枫树的 remez 算法等开始......

But as soon as I implement them using only float precision, there is a significant loss of precision.但是，一旦我仅使用浮点精度来实现它们，就会显着降低精度。 And be sure: I know that with double precision, a much higher precision is no problem at all...并且可以肯定：我知道双精度，更高的精度完全没有问题......

Right now, i have some approximations which are exact up to a few thousand ulp around pi/2 (the range where the largest errors occur), and i feel that i am limited by the single precision conversions.现在，我有一些近似值，在 pi/2（出现最大误差的范围）附近精确到几千 ulp，我觉得我受到单精度转换的限制。

To address the topic argument reduction: input is in radian.为了解决主题参数减少：输入是弧度。 i assume that an argument reduction will cause even more precision loss due to divisions / multiplications.... since my overall input range is only 0..pi, i decided to reduce the argument to 0..pi/2.我认为参数减少会由于除法/乘法而导致更多的精度损失......因为我的整体输入范围只有 0..pi，我决定将参数减少到 0..pi/2。

Therefore my question is: Does anybody know a single precision approximation to cosine function with high accuracy (and in the best case high efficiency)?因此我的问题是：有没有人知道余弦函数的单精度逼近精度高（在最好的情况下效率高）？ Are there any algorithms that optimize approximations for single precision?是否有任何算法可以优化单精度的近似值？ Do you know whether the built-in cosf function computes the values with single oder double precision internally?你知道内置的 cosf 函数是否在内部计算单精度双精度的值吗？ ~ ~

float ua_cos_v2(float x)
{
    float output;
    float myPi = 3.1415927410125732421875f;
    if (x < 0) x = -x;
    int quad = (int32_t)(x*0.63661977236f);//quad = x/(pi/2) = x*2/pi
    if (x<1.58f && x> 1.57f) //exclude approximation around pi/2
    {
        output = -(x - 1.57079637050628662109375f) - 2.0e-12f*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f) + 0.16666667163372039794921875f*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f) + 2.0e-13f*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)+ 0.000198412701138295233249664306640625f*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f);
        output -= 4.37E-08f;
    }
    else {
        float param_x;
        int param_quad = -1;
        switch (quad)
        {
        case 0:
            param_x = x;
            break;
        case 1:
            param_x = myPi - x;
            param_quad = 1;
            break;
        case 2:
            param_x = x - myPi;
            break;
        case 3:
            param_x = 2 * myPi - x;
            break;
        }
        float c1 = 1.0f,
            c2 = -0.5f,
            c3 = 0.0416666679084300994873046875f,
            c4 = -0.001388888922519981861114501953125f,
            c5 = 0.00002480158218531869351863861083984375f,
            c6 = -2.75569362884198199026286602020263671875E-7f,
            c7 = 2.08583283978214240050874650478363037109375E-9f,
            c8 = -1.10807162057025010426514199934899806976318359375E-11f;
        float _x2 = param_x * param_x;
        output = c1 + _x2*(c2 + _x2*(c3 + _x2*(c4 + _x2*(c5 + _x2*(c6 + _x2*(c7 
        + _x2* c8))))));
        if (param_quad == 1 || param_quad == 0)
            output = -output;
    }
    return output;
}

~ ~

if I have forgotten any information, please do not hesitate to ask!如果我忘记了任何信息，请不要犹豫，问！

Thanks in advance提前致谢

Answer 1

It is certainly possible to compute cosine on [0, π] with any desired error bound >= 0.5 ulp using just native precision operations.当然可以仅使用本机精度操作来计算 [0, π] 上的余弦，并且具有任何所需的误差界限 >= 0.5 ulp。 However, the closer the target is to a correctly rounded function, the more up-front design work and computational work at runtime is required.然而，目标越接近正确舍入的函数，运行时需要的前期设计工作和计算工作就越多。

Transcendental functions implementations typically consist of argument reduction, core approximation(s), final fixup to counteract the argument reduction.先验函数的实现通常包括参数缩减、核心近似、最终修复以抵消参数缩减。 In cases where the argument reduction involves subtraction, catastrophic cancellation needs to be avoided by explicitly or implicitly using higher precision.在参数减少涉及减法的情况下，需要通过显式或隐式使用更高的精度来避免灾难性取消。 Implicit techniques can be designed to rely just on native precision computation, for example by splitting a constant like π into an unevaluated sum such as 1.57079637e+0f - 4.37113883e-8f when using IEEE-754 binary32 (single precision).隐式技术可以设计为仅依赖于本机精度计算，例如，在使用 IEEE-754 binary32 （单精度）时，将像 π 这样的常数拆分为未计算的和，例如1.57079637e+0f - 4.37113883e-8f 。

Achieving high accuracy with native precision computation is a lot easier when the hardware provides a fused multiply-add (FMA) operation.当硬件提供融合乘加 (FMA) 运算时，通过本机精度计算实现高精度要容易得多。 OP did not specify whether their target platform provides this operation, so I will first show a very simple approach offering moderate accuracy (maximum error < 5 ulps) relying just on multiplies and adds. OP 没有指定他们的目标平台是否提供此操作，因此我将首先展示一种非常简单的方法，该方法仅依靠乘法和加法来提供中等精度（最大误差 < 5 ulps）。 I am assuming hardware that adheres to the IEEE-754 standard, and assume that float is mapped to IEEE-754 binary32 format.我假设硬件符合 IEEE-754 标准，并假设float映射到 IEEE-754 binary32格式。

The following is based on a blog post by Colin Wallace titled "Approximating sin(x) to 5 ULP with Chebyshev polynomials", which is not available online at time of writing.以下内容基于 Colin Wallace 题为“使用切比雪夫多项式逼近 sin(x) 到 5 ULP”的博客文章，在撰写本文时该文章无法在线获取。 I originally retrieved it here and Google presently retains a cached copy here .我最初检索它在这里和谷歌目前保留缓存副本这里。 They propose to approximate sine on [-π, π] by using a polynomial in x² of sin(x)/(x*(x²-π²)), then multiplying this by x*(x²-π²).他们建议通过使用 sin(x)/(x*(x²-π²)) 的 x² 中的多项式，然后将其乘以 x*(x²-π²) 来逼近 [-π, π] 上的正弦。 A standard trick to compute a²-b² more accurately is to rewrite it as (ab) * (a+b).更准确地计算 a²-b² 的标准技巧是将其重写为 (ab) * (a+b)。 Representing π as an unevaluated sum of two floating-point numbers pi_high and pi_low avoids catastrophic cancellation during subtraction, which turns the computation x²-π² into ((x - pi_hi) - pi_lo) * ((x + pi_hi) + pi_lo) .将 π 表示为两个浮点数 pi_high 和 pi_low 的未计算总和避免了减法过程中的灾难性取消，这将计算 x²-π² 变成((x - pi_hi) - pi_lo) * ((x + pi_hi) + pi_lo) 。

The polynomial core approximation should ideally use a minimax approximation, which min imizes the max imum error.理想情况下，多项式核心近似应使用最小最大近似，该近似最小化最大imum 误差。 I have done so here.我在这里已经这样做了。 Various standard tools like Maple or Mathematics can be used for this, or one create one's own code based on the Remez algorithm.为此可以使用各种标准工具，例如 Maple 或 Mathematics，或者根据 Remez 算法创建自己的代码。

For a cosine computation on [0, PI] we can make use of the fact that cos (t) = sin (π/2 - t).对于 [0, PI] 上的余弦计算，我们可以利用 cos (t) = sin (π/2 - t) 这一事实。 Substituting x = (π/2 - t) into x * (x - π/2) * (x + π/2) yields (π/2 - t) * (3π/2 - t) * (-π/2 - t).将 x = (π/2 - t) 代入 x * (x - π/2) * (x + π/2) 产生 (π/2 - t) * (3π/2 - t) * (-π/2) - 吨）。 The constants can be split into high and low parts (or head and tail, to use another common idiom) as before.常量可以像以前一样分为高低部分（或头和尾，使用另一种常见的习语）。

/* Approximate cosine on [0, PI] with maximum error of 4.704174 ulp */
float cosine (float x)
{
    const float half_pi_hi       =  1.57079637e+0f; //  0x1.921fb6p+0
    const float half_pi_lo       = -4.37113883e-8f; // -0x1.777a5cp-25
    const float three_half_pi_hi =  4.71238899e+0f; //  0x1.2d97c8p+2
    const float three_half_pi_lo = -1.19248806e-8f; // -0x1.99bc5cp-27
    float p, s, hpmx, thpmx, nhpmx;

    /* cos(x) = sin (pi/2 - x) = sin (hpmx) */
    hpmx = (half_pi_hi - x) + half_pi_lo;               // pi/2-x
    thpmx = (three_half_pi_hi - x) + three_half_pi_lo;  // 3*pi/2 - x
    nhpmx = (-half_pi_hi - x) - half_pi_lo;             // -pi/2 - x

    /* P(hpmx*hpmx) ~= sin (hpmx) / (hpmx * (hpmx * hpmx - pi * pi)) */
    s = hpmx * hpmx;
    p =         1.32729383e-10f;
    p = p * s - 2.33177868e-8f;
    p = p * s + 2.52223435e-6f;
    p = p * s - 1.73503853e-4f;
    p = p * s + 6.62087463e-3f;
    p = p * s - 1.01321176e-1f;
    return hpmx * nhpmx * thpmx * p;
}

Below I am showing a classical approach which first reduces the argument into [-π/4, π/4] while recording the quadrant.下面我展示了一种经典方法，它首先在记录象限时将参数减少到 [-π/4, π/4]。 The quadrant then tells us whether we need to compute a polynomial approximation to the sine or the cosine on this primary approximation interval, and whether we need to flip the sign of the final result.然后象限告诉我们是否需要在这个主要近似区间上计算正弦或余弦的多项式近似，以及我们是否需要翻转最终结果的符号。 This code assumes that the target platform supports the FMA operation specified by IEEE-754, and that it is mapped via the standard C function fmaf() for single precision.此代码假定目标平台支持 IEEE-754 指定的 FMA 操作，并且它通过标准 C 函数fmaf()映射为单精度。

The code is straightforward except for the float-to-int conversion with rounding mode to-nearest-or-even that is used to compute the quadrant, which is performed by the "magic number addition" method and combined with the multiplication of 2/π (equivalent to division by π/2).代码很简单，除了用于计算象限的舍入模式为最近或偶数的 float-to-int 转换，这是通过“幻数加法”方法执行的，并结合 2/ 的乘法π（相当于除以 π/2）。 The maximum error is less than 1.5 ulps.最大误差小于 1.5 ulps。

/* compute cosine on [0, PI] with maximum error of 1.429027 ulp */
float my_cosf (float a)
{
    const float half_pi_hi =  1.57079637e+0f; //  0x1.921fb6p+0
    const float half_pi_lo = -4.37113883e-8f; // -0x1.777a5cp-25
    float c, j, r, s, sa, t;
    int i;

    /* subtract closest multiple of pi/2 giving reduced argument and quadrant */
    j = fmaf (a, 6.36619747e-1f, 12582912.f) - 12582912.f; // 2/pi, 1.5 * 2**23
    a = fmaf (j, -half_pi_hi, a);
    a = fmaf (j, -half_pi_lo, a);

    /* phase shift of pi/2 (one quadrant) for cosine */
    i = (int)j;
    i = i + 1;

    sa = a * a;
    /* Approximate cosine on [-PI/4,+PI/4] with maximum error of 0.87444 ulp */
    c =               2.44677067e-5f;  //  0x1.9a8000p-16
    c = fmaf (c, sa, -1.38877297e-3f); // -0x1.6c0efap-10
    c = fmaf (c, sa,  4.16666567e-2f); //  0x1.555550p-5
    c = fmaf (c, sa, -5.00000000e-1f); // -0x1.000000p-1
    c = fmaf (c, sa,  1.00000000e+0f); //  1.00000000p+0
    /* Approximate sine on [-PI/4,+PI/4] with maximum error of 0.64196 ulp */
    s =               2.86567956e-6f;  //  0x1.80a000p-19
    s = fmaf (s, sa, -1.98559923e-4f); // -0x1.a0690cp-13
    s = fmaf (s, sa,  8.33338592e-3f); //  0x1.111182p-7
    s = fmaf (s, sa, -1.66666672e-1f); // -0x1.555556p-3
    t = a * sa;
    s = fmaf (s, t, a);

    /* select sine approximation or cosine approximation based on quadrant */
    r = (i & 1) ? c : s;
    /* adjust sign based on quadrant */
    r = (i & 2) ? (0.0f - r) : r;

    return r;
}

As it turns out, in this particular case the use of FMA provides only a tiny benefit in terms of accuracy.事实证明，在这种特殊情况下，使用 FMA 在准确性方面只提供了很小的好处。 If I replace calls to fmaf(a,b,c) with ((a)*(b)+(c)) , the maximum error increases minimally to 1.451367 ulps, that is, it stays below 1.5 ulps.如果我用((a)*(b)+(c))替换对fmaf(a,b,c)调用，最大误差会最小地增加到 1.451367 ulps，也就是说，它保持在 1.5 ulps 以下。

Answer 2

I see @njuffa has a good approach yet want to pose another approach given:我看到@njuffa 有一个很好的方法，但想提出另一种方法：

Angle is likely originally in degrees, not radians and take advantage of that.角度最初可能是度数，而不是弧度，并利用它。
Does not depend on float being IEEE.不依赖于float是 IEEE。
fma may be weak and so not use it. fma 可能很弱，所以不要使用它。

Perform range reduction using integer math, then find answer via self adjusting Taylor series.使用整数数学执行范围缩减，然后通过自我调整泰勒级数找到答案。

#include <assert.h>

static float my_sinf_helper(float xx, float term, unsigned n) {
  if (term + 1.0f == 1.0f) {
    return term;
  }
  return term - my_sinf_helper(xx, xx * term / ((n + 1) * (n + 2)), n + 2);
}

static float my_cosf_helper(float xx, float term, unsigned n) {
  if (term + 1.0f == 1.0f) {
    return term;
  }
  return term - xx * my_cosf_helper(xx, term / ((n + 1) * (n + 2)), n + 2);
}

// valid for [-pi/4 + pi/4]
static float my_sinf_primary(float x) {
  return x * my_sinf_helper(x * x, 1.0, 1);
}

// valid for [-pi/4 + pi/4]
static float my_cosf_primary(float x) {
  return my_cosf_helper(x * x, 1.0, 0);
}

#define MY_PIf 3.1415926535897932384626433832795f
#define D2Rf(d) ((d)*(MY_PIf/180))

float my_cosdf(float x) {
  if (x < 0) {x = -x;}
  unsigned long long ux = (unsigned long long) x;
  x -= (float) ux;
  unsigned ux_primary = ux % 360u;
  int uxq = ux_primary%90;
  if (uxq >= 45) uxq -= 90;
  x += uxq;
  switch (ux_primary/45) {
    case 7: //
    case 0: return my_cosf_primary(D2Rf(x));
    case 1: //
    case 2: return -my_sinf_primary(D2Rf(x));
    case 3: //
    case 4: return -my_cosf_primary(D2Rf(x));
    case 5: //
    case 6: return my_sinf_primary(D2Rf(x));
  }
  assert(0);
  return 0;
}

Test code测试代码

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#define DBL_FMT "%+24.17e"

typedef struct {
  double x, y0, y1, adiff;
  unsigned n;
} test;

test worst = {0};

int my_cosd_test(float x) {
  test t;
  t.x = x;
  t.y0 = cos(x*acos(-1)/180);
  t.y1 = my_cosdf(x);
  t.adiff = fabs(t.y1 - t.y0);
  if (t.adiff > worst.adiff) {
    t.n = worst.n + 1;
    printf("n:%3u x:" DBL_FMT " y0:" DBL_FMT " y1:" DBL_FMT " d:" DBL_FMT "\n", //
        t.n, t.x, t.y0, t.y1, t.adiff);
    fflush(stdout);
    worst = t;
    if (t.n > 100)
      exit(-1);
  }
  return t.adiff != 0.0;
}

float rand_float_finite(void) {
  union {
    float f;
    unsigned char uc[sizeof(float)];
  } u;
  do {
    for (size_t i = 0; i < sizeof u.uc / sizeof u.uc[0]; i++) {
      u.uc[i] = (unsigned char) rand();
    }
  } while (!isfinite(u.f) || fabs(u.f) > 5000);
  return u.f;
}

int my_cosd_tests(unsigned n) {
  my_cosd_test(0.0);
  for (unsigned i = 0; i < n; i++) {
    my_cosd_test(rand_float_finite());
  }
  return 0;
}

int main(void) {
  my_cosd_tests(1000000);
}

Worst cast error: +8.2e-08.最坏的施法错误：+8.2e-08。 Max recursion depth note: 6.最大递归深度注：6。

n: 14 x:+3.64442993164062500e+03 y0:+7.14107074054115110e-01 y1:+7.14107155799865723e-01 d:+8.17457506130381262e-08

I'll review more later.稍后我会更多地回顾。 I do see more extensive testing reaching about 9e-08 worst case error and some TBD issue with x > about 1e10 .我确实看到更广泛的测试达到了大约 9e-08 最坏情况错误和一些 TBD 问题， x > about 1e10 。

仅使用单精度浮点数逼近 [0,pi] 上的余弦

问题描述

2 个解决方案

解决方案1
7 已采纳 2020-09-17 05:47:31

解决方案2
1 2020-09-17 12:57:26

仅使用单精度浮点数逼近 [0,pi] 上的余弦

问题描述

2 个解决方案

解决方案1 7 已采纳 2020-09-17 05:47:31

解决方案2 1 2020-09-17 12:57:26

解决方案1
7 已采纳 2020-09-17 05:47:31

解决方案2
1 2020-09-17 12:57:26