简体   繁体   English

展开的 for 循环之间的装配差异导致不同的浮动结果

[英]Assembly differences between unrolled for-loops cause differing float results

Consider the below setup:考虑以下设置:

typedef struct
{
    float d;
} InnerStruct;

typedef struct
{
    InnerStruct **c;
} OuterStruct;


float TestFunc(OuterStruct *b)
{
    float a = 0.0f;
    for (int i = 0; i < 8; i++)
        a += b->c[i]->d;
    return a;
}

The for loop in TestFunc exactly replicates one in another function that I'm testing. TestFunc 中的 for 循环完全复制了我正在测试的另一个 function 中的循环。 Both loops are unrolled by gcc (4.9.2) but yield slightly different assembly after doing so.两个循环都由 gcc (4.9.2) 展开,但这样做后产生的组装略有不同。

Assembly for my test loop:ㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤAssembly for the original loop:我的测试循环的组装:ㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤ原始循环的组装:

lwz       r9,-0x725C(r13)                   lwz       r9,0x4(r3)    
lwz       r8,0x4(r9)                        lwz       r8,0x8(r9)    
lwz       r10,0x0(r9)                       lwz       r10,0x4(r9)   
lwz       r11,0x8(r9)                       lwz       r11,0x0C(r9)  
lwz       r4,0x4(r8)                        lwz       r3,0x4(r8)    
lwz       r10,0x4(r10)                      lwz       r10,0x4(r10)  
lwz       r8,0x4(r11)                       lwz       r0,0x4(r11)   
lwz       r11,0x0C(r9)                      lwz       r11,0x10(r9)  
efsadd    r4,r4,r10                         efsadd    r3,r3,r10
lwz       r10,0x10(r9)                      lwz       r8,0x14(r9)   
lwz       r7,0x4(r11)                       lwz       r10,0x4(r11)  
lwz       r11,0x14(r9)                      lwz       r11,0x18(r9)  
efsadd    r4,r4,r8                          efsadd    r3,r3,r0
lwz       r8,0x4(r10)                       lwz       r0,0x4(r8)    
lwz       r10,0x4(r11)                      lwz       r8,0x0(r9)    
lwz       r11,0x18(r9)                      lwz       r11,0x4(r11)  
efsadd    r4,r4,r7                          efsadd    r3,r3,r10
lwz       r9,0x1C(r9)                       lwz       r10,0x1C(r9)  
lwz       r11,0x4(r11)                      lwz       r9,0x4(r8)    
lwz       r9,0x4(r9)                        efsadd    r3,r3,r0
efsadd    r4,r4,r8                          lwz       r0,0x4(r10)   
efsadd    r4,r4,r10                         efsadd    r3,r3,r11
efsadd    r4,r4,r11                         efsadd    r3,r3,r9
efsadd    r4,r4,r9                          efsadd    r3,r3,r0

The issue is the float values these instructions return are not exactly the same.问题是这些指令返回的浮点值并不完全相同。 And I can't change the original loop.而且我无法更改原始循环。 I need to modify the test loop somehow to return the same values.我需要以某种方式修改测试循环以返回相同的值。 I believe the test's assembly is equivalent to just adding each element one after another.我相信测试的组装相当于只是一个接一个地添加每个元素。 I'm not very familiar with assembly so I wasn't sure how the above differences translated into c. I know this is the issue because if I add a print to the loops, they don't unroll and the results match exactly as expected.我对汇编不是很熟悉,所以我不确定上述差异是如何转化为 c 的。我知道这是问题所在,因为如果我在循环中添加打印,它们不会展开并且结果与预期完全匹配.

I presume this is for unit-testing the one function with another.我认为这是为了对一个 function 与另一个进行单元测试。

In general floating point calculations are never exact in C or C++ and it is not usually considered legitimate to expect them to be.一般来说,浮点数计算在 C 或 C++ 中从来都不是精确的,而且期望它们是这样通常被认为是不合法的。

The Java language standard requires exact floating point results. Java 语言标准需要精确的浮点结果。 Doing this is a constant source of hatred against Java , with various accusations that making the results reproducible usually makes them less accurate and sometimes makes the code much slower too.这样做是对 Java 的持续仇恨来源,各种指责使结果可重现通常会降低它们的准确性,有时还会使代码变慢。

If you are doing your testing in C or C++ then I would suggest this approach:如果您在 C 或 C++ 中进行测试,那么我建议采用这种方法:

Calculate the result as best you can, with both high precision and high accuracy.尽可能计算出结果,精度和准确度都很高。 In this case the input data are in 32-bit float, so convert them all to 64-bit float before calculating the expected result.在这种情况下,输入数据是 32 位浮点数,因此在计算预期结果之前将它们全部转换为 64 位浮点数。

If the inputs were in double (and you don't have a bigger long double type) then sort the values into order and add them up smallest to largest.如果输入是双精度的(并且您没有更大的 long double 类型),则将值按顺序排序并将它们从小到大相加。 This will result in the least loss of accuracy.这将导致最小的准确性损失。

Once you have your expected result then test that the function output matches it within some bounds.获得预期结果后,请测试 function output 是否在一定范围内匹配它。

There are two approaches to setting what accuracy you require to consider the test as a pass:有两种方法可以设置将测试视为通过所需的准确度:

One approach is to check what the real physical meaning of the number is and what accuracy you actually require.一种方法是检查数字的真正物理含义是什么以及您实际需要的精度。

The other approach is to just require that the result is accurate to within a few least-significant-bits of the ideal result, ie: that the error is less than a few times the ideal result times FLT_EPSILON.另一种方法是只要求结果精确到理想结果的几个最低有效位以内,即:误差小于理想结果乘以 FLT_EPSILON 的几倍。

Disabling fast-math seems to fix this issue.禁用快速数学似乎可以解决此问题。 Thanks to @njuffa for the suggestion.感谢@njuffa 的建议。 I was hoping to be able to design the test function around this optimization, but it doesn't seem to be possible.本来希望能围绕这个优化设计出测试function,但是好像不太可能。 At least I know what the issue is now.至少我知道现在的问题是什么。 Appreciate everyone's help on the problem!感谢大家对问题的帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM