Tag[fast-math] Recent Newest Questions

Weird LTO behavior with -ffast-math

Summary Recently I encountered a weird issue regarding LTO and -ffast-math where I got inconsistent result for my "pow" ( in cmath ) calls depending ...

C++ gcc does associative-math flag disable float NAN values?

I'm working with statistic functions with a lot of float data. I want it to run faster but Ofast disable NAN (fno-finite-math-only flag), which is not ...

Can this piece of code be modified such that it works with fast-math enabled?

Can the code below be modified such that it works correctly even when compiled by GCC with fast-math enabled? Note: I have it in a header file and ...

What does the “denormal input” exactly mean in assembly when we consider using DAZ flag for SSE Floating Points

I've read This article and do-denormal-flags-like-denormals-are-zero-daz-affect-comparisons-for-equality and I understand the usage and difference bet ...

__host__ __device__ functions calling overloaded functions

I do not understand whether there is function overloading in Cuda or not. I want to explain my problem on the following two functions, which I want to ...

Can I determine at compile time whether --use_fast_math was set?

I'm writing some CUDA code, and I want it to behave differently based on whether or not --use_fast_math was set or not. And - I want to make that deci ...

Optimal implementation of iterative Kahan summation

Intro Kahan summation / compensated summation is technique that addresses compilers´ inability to respect the associative property of numbers. Truncat ...

Does GCC's ffast-math have consistency guarantees across platforms or compiler versions?

I want to write cross-platform C/C++ which has reproducible behaviour across different environments. I understand that gcc's ffast-math enables vario ...

Do denormal flags like Denormals-Are-Zero (DAZ) affect comparisons for equality?

If I have 2 denormal floating point numbers with different bit patterns and compare them for equality, can the result be affected by the Denormals-Are ...

Strict aliasing, -ffast-math and SSE

Consider the following program: If I compile with Apple Clang 7.0.2 with and without -ffast-math, I get the expected output 0 0 0 0: However aft ...

Why is std::inner_product slower than the naive implementation?

This is my naive implementation of dot product: And this is using the C++ library: I ran some benchmark(code is here https://github.com/ijklr/ss ...

Can I make my compiler use fast-math on a per-function basis?

Suppose I have and I want to compile one instantiation with -ffast-math (--use-fast-math for nvcc), and the other instantiation without it. This ...

AVX code segfaults when compiled with -ffast-math?

I'm experimenting with writing a couple kernels using GCCs builtin simd support. I've got this code benchmarking an AVX dot product kernel: Strange ...

What is GCC/Clang equivalent of -fp-model fast=1 in ICC

As I read on Intel's website: Intel compiler uses /fp-model fast=1 as defaults. This optimization favors speed over standards compliance. You may ...

Curious result from the gcc linker behaviour around -ffast-math

I've noticed an interesting phenomenon around flags to the compiler linker affecting the running code in ways I cannot understand. I have a library t ...

why -ffast-math option break my bool condition

This is critical section of the program that cause problem, and program is completely sequential. exist_ is a class bool private member, and dbl_num_ ...

Why does GCC or Clang not optimise reciprocal to 1 instruction when using fast-math

Does anyone know why GCC/Clang will not optimist function test1 in the below code sample to simply use just the RCPPS instruction when using the fast- ...

How do I compile with “ffast-math”?

I'm trying to benchmark some Rust code, but I can't figure out how to set the "ffast-math" option. % rustc -C opt-level=3 -C llvm-args='-enable-unsaf ...

gcc -Ofast - complete list of limitations

I'm using -Ofast gcc option in my program cause latency requirements. I wrote simple test program: I've tried to run it with default flags and with ...

Does any floating point-intensive code produce bit-exact results in any x86-based architecture?

I would like to know if any code in C or C++ using floating point arithmetic would produce bit exact results in any x86 based architecture, regardless ...