AVX 內在 _mm256_rsqrt_ps 的相對誤差比根據內在指南應該有的要大得多

Question

英特爾內在指南說內在_mm256_rsqrt_ps的相對誤差最多為1.5*2^-12 。 但是，當我將_mm256_rsqrt_ps的結果與平方根倒數（ 1.0 / sqrt(x) ）的標准 C++ 計算進行比較時，我得到的相對誤差遠大於1.5*2^-12 。

我使用以下程序對此進行了測試：

#include <immintrin.h>
#include <iostream>
#include <math.h>

void test(float x) {
  float resP = _mm256_cvtss_f32(_mm256_rsqrt_ps(_mm256_set1_ps(x)));
  float res = 1.0 / sqrt(x);
  float relErr = fabs(resP - res) / res;
  std::cout << "x = " << x << std::endl;
  std::cout << "resP = " << resP << std::endl;
  std::cout << "res = " << res << std::endl;
  std::cout << "relErr = " << relErr << std::endl;
}

int main() {
  test(1e30);
  test(1e-30);
  test(1e17);
  test(1e-17);
}

它輸出以下內容：

    x = 1e+30
    resP = 1.00007e-15
    res = 1e-15
    relErr = 6.80803e-05
    x = 1e-30
    resP = 9.99868e+14
    res = 1e+15
    relErr = 0.0001316
    x = 1e+17
    resP = 3.16186e-09
    res = 3.16228e-09
    relErr = 0.000132569
    x = 1e-17
    resP = 3.16277e+08
    res = 3.16228e+08
    relErr = 0.000154825

如您所見，相對誤差明顯大於1.5*2^-12 。

_mm256_rcp_ps指令的相對誤差似乎也比內在指南所說的要大得多。

難道我做錯了什么？ 我誤解了內在指南嗎？ 還是內在指南錯了？

Answer 1

您的相對誤差在界限內。

1.5*2^-12 = 0.000366

它只是 2 的冪，而不是 10 的冪。

也沒有聲稱與單精度 1/sqrt(x) 相比具有此相對誤差，而是與精確結果相比。

AVX 內在 _mm256_rsqrt_ps 的相對誤差比根據內在指南應該有的要大得多

問題描述

1 個解決方案

解決方案1
6 已采納 2022-09-27 12:11:39

AVX 內在 _mm256_rsqrt_ps 的相對誤差比根據內在指南應該有的要大得多

問題描述

1 個解決方案

解決方案1 6 已采納 2022-09-27 12:11:39

解決方案1
6 已采納 2022-09-27 12:11:39