簡體   English   中英

對於相同的操作系統/編譯器,觸發函數的結果是否應該依賴於硬件?

[英]Should the Results of Trig Functions Be Hardware Dependent with the Same OS / Compiler?

我有一個簡單的 C++ 程序(foo.cxx):

#include <stdio.h>
#include <math.h>

int main()
{
    long int *p;

    double ang2 = -0.23202523431296057;
    p = (long int*)&ang2;
    printf("The bits of ang2 are %lx\n", *p);

    double sin_ang2 = sin(ang2);
    printf("sin_ang2 is %0.17f\n", sin_ang2);
    p = (long int*)&sin_ang2;
    printf("The bits of sin_ang2 are %lx\n", *p);
}

我有兩台不同硬件的機器,都在 Ubuntu 20.04 上,都在 9.3.0 上使用 gcc。 在這兩台機器上,我用這個命令編譯上面的代碼:

g++ -ffloat-store foo.cxx

在機器 1 上,運行上述程序的結果是:

The bits of ang2 are bfcdb300bc9c468a
sin_ang2 is -0.22994895724656178
The bits of sin_ang2 are bfcd6ef7a98fc7ce

在機器 2 上,運行上述程序的結果是:

The bits of ang2 are bfcdb300bc9c468a
sin_ang2 is -0.22994895724656181
The bits of sin_ang2 are bfcd6ef7a98fc7cf

請注意在這兩台機器上調用 sin() 的結果略有不同。 我的問題是這是否應該被預期。 我意識到浮點運算有許多細微差別會導致不精確的結果,但這是一個例子嗎? 我的理解是 gcc 的 -ffloat-store 選項可以幫助跨機器提供一致的結果,盡管它在這里似乎沒有幫助:

-ffloat-store

 Do not store floating-point variables in registers, and inhibit other options that might change whether a floating-point value is taken from a register or memory. This option prevents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have. Similarly for the x86 architecture. For most programs, the excess precision does only good, but a few programs rely on the precise definition of IEEE floating point. Use -ffloat-store for such programs, after modifying them to store all pertinent intermediate computations into variables.

機器一(lscpu)的硬件是:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   36 bits physical, 48 bits virtual
CPU(s):                          4
...
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           58
Model name:                      Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz
...
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtr
                                 r pge mca cmov pat pse36 clflush dts acpi mmx f
                                 xsr sse sse2 ss ht tm pbe syscall nx rdtscp lm 
                                 constant_tsc arch_perfmon pebs bts rep_good nop
                                 l xtopology nonstop_tsc cpuid aperfmperf pni pc
                                 lmulqdq dtes64 monitor ds_cpl vmx smx est tm2 s
                                 sse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic p
                                 opcnt tsc_deadline_timer aes xsave avx f16c rdr
                                 and lahf_lm cpuid_fault epb pti ssbd ibrs ibpb 
                                 stibp tpr_shadow vnmi flexpriority ept vpid fsg
                                 sbase smep erms xsaveopt dtherm ida arat pln pt
                                 s md_clear flush_l1d

機器 2 的硬件是:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          16
...
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           158
Model name:                      Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
...
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtr
                                 r pge mca cmov pat pse36 clflush dts acpi mmx f
                                 xsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rd
                                 tscp lm constant_tsc art arch_perfmon pebs bts 
                                 rep_good nopl xtopology nonstop_tsc cpuid aperf
                                 mperf pni pclmulqdq dtes64 monitor ds_cpl vmx s
                                 mx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid s
                                 se4_1 sse4_2 x2apic movbe popcnt tsc_deadline_t
                                 imer aes xsave avx f16c rdrand lahf_lm abm 3dno
                                 wprefetch cpuid_fault epb invpcid_single ssbd i
                                 brs ibpb stibp ibrs_enhanced tpr_shadow vnmi fl
                                 expriority ept vpid ept_ad fsgsbase tsc_adjust 
                                 bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx
                                  smap clflushopt intel_pt xsaveopt xsavec xgetb
                                 v1 xsaves dtherm ida arat pln pts hwp hwp_notif
                                 y hwp_act_window hwp_epp md_clear flush_l1d arc
                                 h_capabilities

關於在這兩台機器上獲得一致結果的方法有什么建議嗎?

這些 CPU 的差異不足以為相同的指令產生不同的結果。 您看到的差異來自 libc 中不同的 sin 實現。 實現由鏈接器根據您的 CPU 支持(__sin_avx 或 __sin_fma)動態選擇。

沒有直接的方法來禁用它: Disable AVX-optimized functions in glibc (LD_HWCAP_MASK, /etc/ld.so.nohwcap) for valgrind & gdb record

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM