简体   繁体   English

如何避免未使用的 SIMD 通道中的浮点异常

[英]How to avoid floating point exceptions in unused SIMD lanes

I like to run my code with floating point exceptions enabled.我喜欢在启用浮点异常的情况下运行我的代码。 I do this under Linux using:我在 Linux 下使用:

feenableexcept( FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW );

So far so good.到目前为止,一切都很好。

The issue I am having, is that sometimes the compiler (I use clang8) decides to use SIMD instructions to do a scalar division.我遇到的问题是,有时编译器(我使用 clang8)决定使用 SIMD 指令进行标量除法。 Fine, if that is faster, even for a single scalar, why not.好吧,如果这样更快,即使是单个标量,为什么不呢。

But the result is that an unused lane in the SIMD register can contain a zero.但结果是 SIMD 寄存器中未使用的通道可能包含零。

And when the SIMD division is executed, a floating point exception is thrown.并且在执行 SIMD 除法时,会抛出一个浮点异常。

Does that mean that floating point exceptions cannot be used at all if you allow the compiler to use sse/avx extensions?这是否意味着如果您允许编译器使用 sse/avx 扩展,则根本不能使用浮点异常?

In my case, this line of C code:就我而言,这行 C 代码:

float a0, min, a, d;
...
a0 = (min - a) / (d);

...is exectuted as: ...执行为:

divps  %xmm2,%xmm3

Which then throws a:然后抛出一个:

Thread 1 "noisetuner" received signal SIGFPE, Arithmetic exception.

I think you have found a bug in clang or maybe in llvm.我认为您在 clang 或 llvm 中发现了一个错误。

Here's how I have reproduced , clang 10.0 emits the same code ie has that bug as well.这是我复制的方式,clang 10.0 发出相同的代码,即也有该错误。 Clearly, that vdivps instruction only has valid data in the initial 2 lanes of the vectors, and in the higher 2 lanes it will run 0.0 / 0.0, thus you'll get a runtime exception if you enable these interrupts in mxcsr register like you're doing.显然,该vdivps指令仅在向量的前 2 个通道中具有有效数据,并且在较高的 2 个通道中它将运行 0.0 / 0.0,因此如果您像您一样在mxcsr寄存器中启用这些中断,您将获得运行时异常重新做。

Microsoft, Intel and gcc don't emit divps for that code. Microsoft、Intel 和 gcc 不会为该代码发出divps If you can, switch to gcc and it should be good.如果可以,切换到 gcc 应该不错。

Update: Clang 10+ has an option controlling such optimizations, -ffp-exception-behavior=maytrap , take a look: https://godbolt.org/z/WG7bEE更新: Clang 10+ 具有控制此类优化的选项, -ffp-exception-behavior=maytrap ,看看: https://godbolt.org/z/WG7bEE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM