C double如何表示無窮大？

Question

我從《 計算機系統：程序員的觀點 》一書中學到，IEEE標准要求使用以下64位二進制格式表示雙精度浮點數：

s：1位符號
exp：11位用於指數
frac：52位分數

+ infinity通過以下模式表示為特殊值：

s = 0
所有exp位均為1
所有小數位均為0

而且我認為用於double的完整64位應按以下順序：

（s] [exp）（分數）

因此，我編寫了以下C代碼進行驗證：

//Check the infinity
double x1 = (double)0x7ff0000000000000;  // This should be the +infinity
double x2 = (double)0x7ff0000000000001; //  Note the extra ending 1, x2 should be NaN
printf("\nx1 = %f, x2 = %f sizeof(double) = %d", x1,x2, sizeof(x2));
if (x1 == x2)
    printf("\nx1 == x2");
else
    printf("\nx1 != x2");

但是結果是：

x1 = 9218868437227405300.000000, x2 = 9218868437227405300.000000 sizeof(double) = 8
x1 == x2

為什么數字是有效數字而不是無窮大錯誤？

為什么x1 == x2？

（我正在使用MinGW GCC編譯器。）

加1

我修改了以下代碼，並成功驗證了Infinity和NaN。

//Check the infinity and NaN
unsigned long long x1 = 0x7ff0000000000000ULL; // +infinity as double
unsigned long long x2 = 0xfff0000000000000ULL; // -infinity as double
unsigned long long x3 = 0x7ff0000000000001ULL; // NaN as double
double y1 =* ((double *)(&x1));
double y2 =* ((double *)(&x2));
double y3 =* ((double *)(&x3));

printf("\nsizeof(long long) = %d", sizeof(x1));
printf("\nx1 = %f, x2 = %f, x3 = %f", x1, x2, x3); // %f is good enough for output
printf("\ny1 = %f, y2 = %f, y3 = %f", y1, y2, y3);

結果是：

sizeof(long long) = 8
x1 = 1.#INF00, x2 = -1.#INF00, x3 = 1.#SNAN0
y1 = 1.#INF00, y2 = -1.#INF00, y3 = 1.#QNAN0

詳細的輸出看起來有些奇怪，但是我認為要點很明確。

PS .：似乎沒有必要進行指針轉換。 只需使用%f告訴printf函數以double格式解釋unsigned long long變量。

加2

出於好奇，我使用以下代碼檢查了變量的位重設。

typedef unsigned char *byte_pointer;

void show_bytes(byte_pointer start, int len)
{
    int i;
    for (i = len-1; i>=0; i--)
    {
        printf("%.2x", start[i]);
    }
    printf("\n");
}

我嘗試了下面的代碼：

//check the infinity and NaN
unsigned long long x1 = 0x7ff0000000000000ULL; // +infinity as double
unsigned long long x2 = 0xfff0000000000000ULL; // -infinity as double
unsigned long long x3 = 0x7ff0000000000001ULL; // NaN as double
double y1 =* ((double *)(&x1));
double y2 =* ((double *)(&x2));
double y3 = *((double *)(&x3));

unsigned long long x4 = x1 + x2;  // I want to check (+infinity)+(-infinity)
double y4 = y1 + y2; // I want to check (+infinity)+(-infinity)

printf("\nx1: ");
show_bytes((byte_pointer)&x1, sizeof(x1));
printf("\nx2: ");
show_bytes((byte_pointer)&x2, sizeof(x2));
printf("\nx3: ");
show_bytes((byte_pointer)&x3, sizeof(x3));
printf("\nx4: ");
show_bytes((byte_pointer)&x4, sizeof(x4));

printf("\ny1: ");
show_bytes((byte_pointer)&y1, sizeof(y1));
printf("\ny2: ");
show_bytes((byte_pointer)&y2, sizeof(y2));
printf("\ny3: ");
show_bytes((byte_pointer)&y3, sizeof(y3));
printf("\ny4: ");
show_bytes((byte_pointer)&y4, sizeof(y4));

輸出為：

x1: 7ff0000000000000

x2: fff0000000000000

x3: 7ff0000000000001

x4: 7fe0000000000000

y1: 7ff0000000000000

y2: fff0000000000000

y3: 7ff8000000000001

y4: fff8000000000000  // <== Different with x4

奇怪的是，盡管x1和x2具有與y1和y2相同的位模式，但總和x4與y4不同。

和

printf("\ny4=%f", y4);

給出以下內容：

y4=-1.#IND00  // What does it mean???

他們為什么不同？ 以及如何獲得y4？

Answer 1

首先， 0x7ff0000000000000確實是雙無窮大的位表示。 但是0x7ff0000000000000轉換不會設置位表示形式，它轉換邏輯值0x7ff0000000000000解釋為64位整數。 因此，您需要使用其他方式來設置位模式。

設置位模式的直接方法是

uint64_t bits = 0x7ff0000000000000;
double infinity = *(double*)&bits;

但是，這是未定義的行為。 C標准禁止讀取已存儲為一種基本類型（ uint64_t ）和另一種基本類型（ double ）的值。 這被稱為嚴格的別名規則，它允許編譯器生成更好的代碼，因為它可以假定讀取一種類型和寫入另一種類型的順序無關緊要。

此規則的唯一例外是char類型：明確允許您將任何指針強制轉換為char*並返回。 因此，您可以嘗試使用以下代碼：

char bits[] = {0x7f, 0xf0, 0, 0, 0, 0, 0, 0};
double infinity = *(double*)bits;

即使這不再是未定義的行為，它仍然是實現定義的行為 ： double精度字節的順序取決於您的計算機。 給定的代碼可以在像ARM和Power系列這樣的大型endian機器上運行，而不能在X86上運行。 對於X86，您需要以下版本：

char bits[] = {0, 0, 0, 0, 0, 0, 0xf0, 0x7f};
double infinity = *(double*)bits;

由於無法保證機器將以與整數值相同的順序存儲浮點值，因此實際上沒有辦法解決實現定義的行為。 甚至有些機器使用這樣的字節順序：<1、0、3、2>我什至不想知道是誰提出了這個絕妙的主意，但它確實存在，我們必須忍受。

最后一個問題：浮點運算與整數運算本質上是不同的。 這些位具有特殊含義，並且浮點單元已將其考慮在內。 尤其是對無窮大，NAN和非正規數之類的特殊值進行特殊處理。 並且由於+inf + -inf被定義為產生NAN，因此您的浮點單元將發出NAN的位模式。 整數單位不知道無限數或NAN，因此它只是將位模式解釋為一個巨大的整數，並很樂意執行整數加法（在這種情況下會發生溢出）。 所得的位模式不是NAN的位模式。 它恰好是一個非常大的正浮點數（精確地說是2^1023 ）的位模式，但這沒有任何意義。

實際上，有一種方法可以以可移植的方式設置除NAN之外的所有值的位模式：給定三個包含符號，指數和尾數位的變量，您可以這樣做：

uint64_t sign = ..., exponent = ..., mantissa = ...;
double result;
assert(!(exponent == 0x7ff && mantissa));    //Can't set the bits of a NAN in this way.
if(exponent) {
    //This code does not work for denormalized numbers. And it won't honor the value of mantissa when the exponent signals NAN or infinity.
    result = mantissa + (1ull << 52);    //Add the implicit bit.
    result /= (1ull << 52);    //This makes sure that the exponent is logically zero (equals the bias), so that the next operation will work as expected.
    result *= pow(2, (double)((signed)exponent - 0x3ff));    //This sets the exponent.
} else {
    //This code works for denormalized numbers.
    result = mantissa;    //No implicit bit.
    result /= (1ull << 51);    //This ensures that the next operation works as expected.
    result *= pow(2, -0x3ff);    //Scale down to the denormalized range.
}
result *= (sign ? -1.0 : 1.0);    //This sets the sign.

這使用浮點單元本身將位移到正確的位置。 由於無法使用浮點算法與NAN的尾數位進行交互，因此無法在此代碼中包含NAN的生成。 好吧，您可以生成一個NAN，但是您無法控制其尾數位模式。

Answer 2

初始化

double x1=(double)0x7ff0000000000000;

正在將整數文字轉換為double 。 您可能想要共享按位表示。 這是特定於實現的（也許未指定bahavior ），但是您可以使用聯合：

union { double x; long long n; } u;
u.n = 0x7ff0000000000000LL;

然后使用ux ; 我假設long long和double都是您計算機上的64位。 字節序和浮點表示也很重要。

另請參閱http://floating-point-gui.de/

請注意，並非所有處理器都是x86 ，也不是所有浮點實現都是IEEE754 （即使在2014年大多數都是）。 您的代碼在ARM處理器上（例如在平板電腦中）可能無法正常工作。

Answer 3

您正在將值轉換為兩倍，並且這將無法按預期工作。

double x1=(double)0x7ff0000000000000; // Not setting the value directly

為避免此問題，您可以將該值解釋為雙指針並取消引用（ 盡管這是極不推薦的，並且僅適用於無符號long long == double大小約束 ）：

unsigned long long x1n = 0x7ff0000000000000ULL; // Inf
double x1 = *((double*)&x1n);
unsigned long long x2n = 0x7ff0000000000001ULL; // Signaling NaN
double x2 = *((double*)&x2n);

printf("\nx1=%f, x2=%f sizeof(double) = %d", x1, x2, sizeof(x2));
if (x1 == x2)
    printf("\nx1==x2");
else
    printf("\nx1!=x2"); // x1 != x2

關於ideone的例子

Answer 4

你已經轉換常數0x7ff00...到double 。 這與采用該值的位表示形式並將其解釋為double 。

這也解釋了為什么x1==x2 。 當您轉換為雙精度時，會失去精度。 因此有時對於大整數，在兩種情況下，最終得到的double精度數是相同的。 這給您一些怪異的效果，對於較大的浮點值，加1會使它保持不變。

C double如何表示無窮大？

問題描述

加1

加2

4 個解決方案

解決方案1
20 2014-11-01 12:33:44

解決方案2
8 2014-11-01 11:06:12

解決方案3
7 2014-11-01 11:19:49

解決方案4
6 2014-11-01 11:06:54

C double如何表示無窮大？

問題描述

加1

加2

4 個解決方案

解決方案1 20 2014-11-01 12:33:44

解決方案2 8 2014-11-01 11:06:12

解決方案3 7 2014-11-01 11:19:49

解決方案4 6 2014-11-01 11:06:54

解決方案1
20 2014-11-01 12:33:44

解決方案2
8 2014-11-01 11:06:12

解決方案3
7 2014-11-01 11:19:49

解決方案4
6 2014-11-01 11:06:54