简体   繁体   English

我可以使用传递给C ++函数的形式参数创建联合吗?

[英]Can I create a union with formal parameter passed to a function in C++?

The function below calculates absolute value of 32-bit floating point value: 下面的函数计算32位浮点值的绝对值:

__forceinline static float Abs(float x)
{
    union {
        float x;
        int a;
    } u;
    //u.x = x;
    u.a &= 0x7FFFFFFF;
    return u.x;
}

union u declared in the function holds variable x, which is different from the x which is passed as parameter in the function. 在函数中声明的并集u包含变量x,该变量与在函数中作为参数传递的x不同。 Is there any way to create a union with argument to the function - x? 有什么方法可以创建带有函数参数-x的联合吗?

Any reason the function above with uncommented line be executing longer than this one? 有什么原因使上面带有注释行的函数的执行时间比这个更长?

__forceinline float fastAbs(float a)
{
    int b= *((int *)&a) & 0x7FFFFFFF;
    return *((float *)(&b));
}

I'm trying to figure out best way to take Abs of floating point value in as little count of read/writes to memory as possible. 我正在尝试找出尽可能少地对内存进行读/写操作以获取浮点值Abs的最佳方法。

For the first question, I'm not sure why you can't just what you want with an assignment. 对于第一个问题,我不确定为什么您不能随心所欲地完成任务。 The compiler will do whatever optimizations that can be done. 编译器将尽一切可能的优化。

In your second sample code. 在第二个示例代码中。 You violate strict aliasing. 您违反了严格的别名。 So it isn't the same. 所以不一样。

As for why it's slower: 至于为什么它慢一些:

It's because CPUs today tend to have separate integer and floating-point units. 这是因为当今的CPU倾向于具有独立的整数和浮点单元。 By type-punning like that, you force the value to be moved from one unit to the other. 通过像这样的类型调整,您可以将值从一个单位移动到另一个单位。 This has overhead. 这有开销。 (This is often done through memory, so you have extra loads and stores.) (这通常是通过内存完成的,因此您需要额外的负载和存储。)

In the second snippet: a which is originally in the floating-point unit (either the x87 FPU or an SSE register), needs to be moved into the general purpose registers to apply the mask 0x7FFFFFFF . 在第二个片段中: a最初位于浮点单元(x87 FPU或SSE寄存器)中,需要移入通用寄存器以应用掩码0x7FFFFFFF Then it needs to be moved back. 然后需要将其移回。

In the first snippet: The compiler is probably smart enough to load a directly into the integer unit. 在第一个代码段中:编译器可能足够聪明,可以将a直接加载到整数单元中。 So you bypass the FPU in the first stage. 因此,您可以在第一阶段绕过FPU。

(I'm not 100% sure until you show us the assembly. It will also depend heavily on whether the parameter starts off in a register or on the stack. And whether the output is used immediately by another floating-point operation.) (在向您展示程序集之前,我不确定100%。这在很大程度上还取决于参数是在寄存器中启动还是在堆栈中启动。输出是否立即由其他浮点运算使用。)

Looking at the disassembly of the code compiled in release mode the difference is quite clear! 看看在发布模式下编译的代码的反汇编,差异很明显! I removed the inline and used two virtual function to allow the compiler to not optimize too much and let us show the differences. 我删除了内联并使用了两个虚拟函数,以使编译器不会进行过多优化,并让我们展示它们之间的差异。

This is the first function. 这是第一个功能。

013D1002  in          al,dx  
            union {
                float x;
                int a;
            } u;
            u.x = x;
013D1003  fld         dword ptr [x]   // Loads a float on top of the FPU STACK.
013D1006  fstp        dword ptr [x]   // Pops a Float Number from the top of the FPU Stack into the destination address.
            u.a &= 0x7FFFFFFF;
013D1009  and         dword ptr [x],7FFFFFFFh  // Execute a 32 bit binary and operation with the specified address.
            return u.x;
013D1010  fld         dword ptr [x]  // Loads the result on top of the FPU stack.
        }

This is the second function. 这是第二个功能。

013D1020  push        ebp                       // Standard function entry... i'm using a virtual function here to show the difference.
013D1021  mov         ebp,esp
            int b= *((int *)&a) & 0x7FFFFFFF;
013D1023  mov         eax,dword ptr [a]         // Load into eax our parameter.
013D1026  and         eax,7FFFFFFFh             // Execute 32 bit binary and between our register and our constant.
013D102B  mov         dword ptr [a],eax         // Move the register value into our destination variable
            return *((float *)(&b));
013D102E  fld         dword ptr [a]             // Loads the result on top of the FPU stack.

The number of floating point operations and the usage of FPU stack in the first case is greater. 在第一种情况下,浮点操作的数量和FPU堆栈的使用量更大。 The functions are executing exactly what you asked, so no surprise. 这些功能完全按照您的要求执行,因此毫不奇怪。 So i expect the second function to be faster. 所以我希望第二个功能更快。

Now... removing the virtual and inlining things are a little different, is hard to write the disassembly code here because of course the compiler does a good job, but i repeat, if values are not constants, the compiler will use more floating point operation in the first function. 现在...删除虚拟和内联的东西有些不同,在这里很难写反汇编代码,因为编译器当然做得很好,但是我重复一遍,如果值不是常量,则编译器将使用更多的浮点数在第一个功能中进行操作。 Of course, integer operations are faster than floating point operations. 当然,整数运算比浮点运算要快。

Are you sure that directly using math.h abs function is slower than your method? 您确定直接使用math.h abs函数比您的方法慢吗? If correctly inlined, abs function will just do this! 如果正确内联,abs函数将执行此操作!

00D71016  fabs  

Micro-optimizations like this are hard to see in long code, but if your function is called in a long chain of floating point operations, fabs will work better since values will be already in FPU stack or in SSE registers! 这样的微优化很难在长代码中看到,但是如果在长串浮点运算中调用您的函数,则Fab将更好地工作,因为值已经存在于FPU堆栈或SSE寄存器中! abs would be faster and better optimized by the compiler. 编译器将更快,更好地优化abs。

You cannot measure the performances of optimizations running a loop in a piece of code, you must see how the compiler mix all together in the real code. 您无法衡量在一段代码中运行循环的优化性能,您必须查看编译器如何在实际代码中将所有这些混合在一起。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM