嚴格別名，-ffast-math和SSE

Question

考慮以下程序：

#include <iostream>
#include <cmath>
#include <cstring>
#include <xmmintrin.h>

using namespace std;

int main()
{
    // 4 float32s.
    __m128 nans;
    // Set them all to 0xffffffff which should be NaN.
    memset(&nans, 0xff, 4*4);

    // cmpord should return a mask of 0xffffffff for any non-NaNs, and 0x00000000 for NaNs.
    __m128 mask = _mm_cmpord_ps(nans, nans);
    // AND the mask with nans to zero any of the nans. The result should be 0x00000000 for every component.
    __m128 z = _mm_and_ps(mask, nans);

    cout << z[0] << " " << z[1] << " " << z[2] << " " << z[3] << endl;

    return 0;
}

如果我使用帶有和不-ffast-math Apple Clang 7.0.2進行編譯，我會得到預期的輸出0 0 0 0 ：

$ clang --version
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin14.5.0
Thread model: posix

$ clang test.cpp -o test
$ ./test
0 0 0 0 

$ clang test.cpp -ffast-math -o test
$ ./test 
0 0 0 0

但是在更新到8.1.0之后（抱歉我不知道Clang的哪個實際版本對應 - Apple不再發布該信息），- -ffast-math似乎打破了這個：

$ clang --version
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

$ clang test.cpp -o test
$ ./test
0 0 0 0 

$ clang test.cpp -ffast-math -o test
$ ./test 
nan nan nan nan

我懷疑這是因為嚴格的別名規則或類似的東西。 誰能解釋這種行為？

編輯：我忘了提到如果你做nans = { std::nanf(nullptr), ...它工作正常。

看看Godbolt ，似乎Clang 3.8.1和Clang 3.9之間的行為發生了變化 - 后者刪除了cmpordps指令。 海灣合作委員會7.1似乎留下了它。

Answer 1

這不是嚴格的別名問題。 如果您閱讀-ffast-math的文檔，您將看到您的問題：

啟用快速數學模式。 這定義了__FAST_MATH__預處理器宏，並讓編譯器對浮點數學做出積極的，可能有損的假設。 這些包括：

[...]

浮點運算的操作數不等於NaN和Inf ，和

[...]

-ffast-math允許編譯器假設浮點數永遠不是NaN （因為它設置了-ffinite-math-only選項）。 由於clang嘗試匹配gcc的選項，我們可以從GCC的選項文檔中讀一點，以便更好地理解-ffinite-math-only作用：

允許優化浮點運算，假設參數和結果不是NaN或+ -Infs。

任何-O選項都不應該打開此選項，因為它可能導致程序的輸出不正確，這取決於IEEE或ISO規則/規范的精確實現。

因此，如果您的代碼需要使用NaN ，則不能使用-ffast-math或-ffinite-math-only 。 否則你冒着優化器破壞你的代碼的風險，正如你在這里看到的那樣。

嚴格別名，-ffast-math和SSE

問題描述

1 個解決方案

解決方案1
15 已采納 2017-05-23 12:31:50

嚴格別名，-ffast-math和SSE

問題描述

1 個解決方案

解決方案1 15 已采納 2017-05-23 12:31:50

解決方案1
15 已采納 2017-05-23 12:31:50