简体   繁体   English

避免C ++中的非正规值

[英]Avoiding denormal values in C++

After searching a long time for a performance bug, I read about denormal floating point values. 在搜索了很长时间的性能错误之后,我读到了有关异常浮点值的信息。

Apparently denormalized floating-point values can be a major performance concern as is illustrated in this question: Why does changing 0.1f to 0 slow down performance by 10x? 如以下问题所示,显然非规范化的浮点值可能是主要的性能问题: 为什么将0.1f更改为0会使性能降低10倍?

I have an Intel Core 2 Duo and I am compiling with gcc, using -O2 . 我有一个Intel Core 2 Duo,并且正在使用-O2与gcc进行编译。

So what do I do? 那我该怎么办? Can I somehow instruct g++ to avoid denormal values? 我可以以某种方式指示g ++避免非正规值吗? If not, can I somehow test if a float is denormal? 如果不是,我可以以某种方式测试float是否异常吗?

Wait. 等待。 Before you do anything, do you actually know that your code is encountering denormal values, and that they're having a measurable performance impact? 在做任何事情之前,您实际上是否知道您的代码遇到异常值,并且它们具有可衡量的性能影响?

Assuming you know that, do you know if the algorithm(s) that you're using is stable if denormal support is turned off? 假设您知道这一点,您是否知道如果关闭非正常支持,您正在使用的算法是否稳定? Getting the wrong answer 10x faster is not usually a good performance optimization. 通常,将错误答案更快地提高10倍并不是一个好的性能优化。

Those issues aside: 除了那些问题:

  • If you want to detect denormal values to confirm that their presence, you have a few options. 如果要检测异常值以确认其存在,则有几种选择。 If you have a C99 standard library or Boost, you can use the fpclassify macro. 如果您具有C99标准库或Boost,则可以使用fpclassify宏。 Alternatively, you can compare the absolute values of your data to the smallest positive normal number. 或者,您可以将数据的绝对值与最小的正数进行比较。

  • You can set the hardware to flush denormal values to zero (FTZ), or treat denormal inputs as zero (DAZ). 您可以将硬件设置为将非正规值刷新为零(FTZ),或将非正规输入视为零(DAZ)。 The easiest way, if it is properly supported on your platform, is probably to use the fesetenv( ) function in the C header fenv.h . 如果您的平台上正确支持该方法,最简单的方法可能是使用C头文件fenv.hfesetenv( )函数。 However, this is one of the least-widely supported features of the C standard, and is inherently platform specific anyway. 但是,这是C标准受最广泛支持的功能之一,并且固有地还是平台特定的。 You may want to just use some inline assembly to directly set the FPU state to (DAZ/FTZ). 您可能只想使用一些内联程序集将FPU状态直接设置为(DAZ / FTZ)。

You can test whether a float is denormal using 您可以使用以下方法测试浮点数是否异常

#include <cmath>

if ( std::fpclassify( flt ) == FP_SUBNORMAL )

(Caveat: I'm not sure that this will execute at full speed in practice.) (注意:实际上,我不确定这是否会全速执行。)

In C++03, and this code has worked for me in practice, 在C ++ 03中,此代码在实践中对我有用,

#include <cmath>
#include <limits>

if ( flt != 0 && std::fabsf( flt ) < std::numeric_limits<float>::min() ) {
    // it's denormalized
}

To decide where to apply this, you may use a sample-based analyzer like Shark, VTune, or Zoom , to highlight the instructions slowed by denormal values. 要决定将其应用到何处,您可以使用基于示例的分析器(例如Shark,VTune或Zoom )来突出显示因非标准值而变慢的指令。 Micro-optimization, even more than other optimizations, is totally hopeless without analysis both before and after. 如果没有前后的分析,微优化甚至比其他优化都更加没有希望。

Most math coprocessors have an option to truncate denormal values to zero. 大多数数学协处理器可以选择将非正规值截断为零。 On x86 it is the FZ (Flush to Zero) flag in the MXCSR control register. 在x86上,它是MXCSR控制寄存器中的FZ(刷新至零)标志。 Check your CRT implementation for a support function to set the control register. 检查您的CRT实现是否具有支持功能来设置控制寄存器。 It ought to be in <float.h> , something resembling _controlfp(). 它应该在<float.h> ,类似于_controlfp()。 The option bit usually has "FLUSH" in the #defined symbol. 选项位通常在#defined符号中带有“ FLUSH”。

Double-check your math results after you set this. 设置后,请仔细检查您的数学结果。 Which is something you ought to do anyway, getting denormals is a sign of health problems. 无论如何,这是您应该做的事情,变得异常正常是健康问题的迹象。

To have (flush-to-zero) FTZ (assuming underflow is masked by default) in gcc: 要在gcc中具有(刷新至零)FTZ(假定默认情况下掩盖了下溢):

#define CSR_FLUSH_TO_ZERO         (1 << 15)
unsigned csr = __builtin_ia32_stmxcsr();
csr |= CSR_FLUSH_TO_ZERO;
__builtin_ia32_ldmxcsr(csr);

In case it's not obvious from the names, __builtin_ia32_stmxcsr and __builtin_ia32_ldmxcsr are available only if you're targeting a x86 processor. 如果名称中的名称不明显,则仅当您针对x86处理器时, __builtin_ia32_stmxcsr__builtin_ia32_ldmxcsr才可用。 ARM, Sparc, MIPS, etc. will each need separate platform-specific code with this approach. ARM,Sparc,MIPS等将各自需要使用此方法的单独的特定于平台的代码。

Just as an addition to the other answers, if you actually have a problem with denormal floating point values you probably have a precision problem in addition to your performance issue. 就像其他答案一样,如果您确实对非正常浮点值有疑问,那么除了性能问题外,您可能还会遇到精度问题。

It may be a good idea to check if you can restructure your computations to keep the numbers larger to avoid losing precision and performance. 检查是否可以重组计算以保持数值更大以避免丢失精度和性能可能是个好主意。

You apparently want some CPU instructions called FTZ (Flush To Zero) and DAZ (Denormals Are Zero). 您显然想要一些CPU指令,称为FTZ(刷新为零)和DAZ(异常为零)。

I found the information on an audio web site but their link to the Intel documentation was missing. 我在音频网站上找到了该信息,但是缺少指向英特尔文档的链接。 They are apparently SSE2 instructions so they should work on AMD CPUs that support that. 它们显然是SSE2指令,因此它们应在支持该指令的AMD CPU上工作。

I don't know what you can do in GCC to force that on in a portable way. 我不知道您可以在GCC中做什么以强制方式将其强制启用。 You can always write inline assembly code to use them though. 您始终可以编写内联汇编代码来使用它们。 You may have to force GCC to use only SSE2 for floating point math. 您可能必须强制GCC将SSE2仅用于浮点数学运算。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM