定點算術中的單精度

Question

使用定點算法進行泰勒級數計算時，我需要達到6位小數位精度。 我嘗試了不同的定點格式以實現6個小數位精度。

例如，使用s16.15（左移15）格式，我得到了2個小數位精度。1個符號位，16個整數位和15個小數位。

對於s8.23（左移23）格式，最多可保留小數點后4位；對於s4.27（左移27）格式，精度仍然相同。 我原以為情況會有所改善。

以下是泰勒級數展開式，用於計算某個點a附近的自然對數。

所以q = xa，x是1到2之間的用戶輸入。

  // These are converted constants into s4.27 fixed point format
  const int32_t con=0x0B8AA3B3; //1.44269504088895
  const int32_t c0=0x033E647E; //0.40546510810816
  const int32_t c1=0x05555555; //0.66666666666666
  const int32_t c2=0x01C71C72; //0.222222222222
  const int32_t c3=0x00CA4588; //0.0987654321
  const int32_t c4=0x006522C4; //0.04938271605
  const int32_t c5=0x0035F069; //0.02633744856
  const int32_t c6=0x001DF757; //0.01463191587

//Expanded taylor series    
taylor=c0+mul(q,(c1-mul(q,(c2+mul(q,(c3-mul(q,(c4-mul(q,(c5+mul(q,c6)))))))))));
// Multiplication function
int32_t mul(int32_t x, int32_t y)
{
int32_t mul;
mul=((((x)>>13)*((y)>>13))>>1); // for s4.27 format, the best possible right shift
return mul;
}

上面提到的代碼段用在C語言中。

我需要的結果：0.584963但我得到的結果是：0.584949

如何獲得更高的精度？

Answer 1

OP的mul()放棄了太多的精度。

(x)>>13)*((y)>>13)立即丟棄x和y的最低有效13位。

而是執行64位乘法

int32_t mul_better(int32_t x, int32_t y) {
  int64_t mul = x;
  mul *= y;
  mul >>= 27;

  // Code may want to detect overflow here (not shown)

  return (int32_t) mul;
}

更好的是，在舍棄最低有效位之前，將乘積四舍五入至最接近（等於偶數）。 簡化是可能的。 下面的詳細代碼僅供參考。

int32_t mul_better(int32_t x, int32_t y) {
  int64_t mul = x;
  mul *= y;
  int32_t least = mul % ((int32_t)1 << 27);
  mul /= (int32_t)1 << 27;
  int carry = 0;
  if (least >= 0) {
    if (least >  ((int32_t)1 << 26) carry = 1;
    else if ((least ==  ((int32_t)1 << 26)) && (mul % 2)) carry = 1;
  } else {
    if (-least > ((int32_t)1 << 26) carry = -1;
    else if ((-least ==  ((int32_t)1 << 26)) && (mul % 2)) carry = -1;
  }
  return (int32_t) (mul + carry);
}

int32_t mul(int32_t x, int32_t y) {
  int64_t mul = x;
  mul *= y;
  return mul >> 27;
}

void foo(double x) {
  int32_t q = (int32_t) (x * (1 << 27));  // **
  int32_t taylor =
      c0 + mul(q, (c1 - mul(q, (c2  + mul(q,
      (c3 - mul(q, (c4 - mul(q, (c5 + mul(q, c6)))))))))));
  printf("%f %f\n", x,  taylor * 1.0 / (1 << 27));
}

int main(void) {
  foo(0.303609);
}

輸出量

0.303609 0.584963

**也可以在這里舍入，而不是簡單地將FP截斷為整數。

定點算術中的單精度

問題描述

1 個解決方案

解決方案1
2 已采納 2017-11-19 05:50:53

定點算術中的單精度

問題描述

1 個解決方案

解決方案1 2 已采納 2017-11-19 05:50:53

解決方案1
2 已采納 2017-11-19 05:50:53