简体   繁体   English

-DNDEBUG with -Ofast比-Ofast慢

[英]-DNDEBUG with -Ofast is slower than only -Ofast

I'm in the process of optimizing a simple genetic algorithm and neural network and I'm fiddling with some options in GCC to generate faster executables. 我正在优化一个简单的遗传算法和神经网络,我正在摆弄GCC中的一些选项来生成更快的可执行文件。

In my code I have some assertions, such as 在我的代码中,我有一些断言,例如

mat mat_add(mat a, mat b)
{
    assert(a->rows == b->rows);
    assert(a->cols == b->cols);
    mat m = mat_create(a->rows, a->cols);
    for(size_t i = 0; i < a->rows; i++) {
        for(size_t j = 0; j < a->cols; j++)
            mat_set(m, i, j, mat_get(a, i, j) + mat_get(b, i, j));
    }
    return m;
}

I've figured that if I added -DNDEBUG to disable the assertions, the executable would be faster because it wouldn't check the conditions above. 我想如果我添加-DNDEBUG来禁用断言,可执行文件会更快,因为它不会检查上面的条件。 However, it is actually slower. 然而,它实际上更慢。

Without -DNDEBUG : 没有-DNDEBUG

$ gcc src/*.c -lm -pthread -Iinclude/ -Wall -Ofast
$ for i in $(seq 1 5); do time ./a.out; done

real    0m11.677s
user    1m28.786s
sys     0m0.729s

real    0m11.716s
user    1m29.304s
sys     0m0.723s

real    0m12.217s
user    1m31.707s
sys     0m0.806s

real    0m12.602s
user    1m32.863s
sys     0m0.726s

real    0m12.225s
user    1m30.915s
sys     0m0.736s

With -DNDEBUG : 使用-DNDEBUG

$ gcc src/*.c -lm -pthread -Iinclude/ -Wall -Ofast -DNDEBUG
$ for i in $(seq 1 5); do time ./a.out; done

real    0m13.698s
user    1m42.533s
sys     0m0.792s

real    0m13.764s
user    1m43.337s
sys     0m0.709s

real    0m13.655s
user    1m42.986s
sys     0m0.739s

real    0m13.836s
user    1m43.138s
sys     0m0.719s

real    0m14.072s
user    1m43.879s
sys     0m0.712s

It isn't much slower but it is noticeable. 它并没有太慢,但它是显而易见的。

What could be causing this slowdown? 什么可能导致这种放缓?

Do the mat_set and mat_get functions perform their own bounds checks on the indices? mat_setmat_get函数是否对索引执行自己的边界检查? With the asserts present, the loop is only reachable if b->rows == a->rows is true. 如果存在断言,则只有在b->rows == a->rows为true时才能访问循环。 That allows the compiler to optimize out any check i < b->rows in the mat_get for b , because it knows b->rows == a->rows and i < a->rows by the loop condition. 允许编译器优化任何检查i < b->rowsmat_getb ,因为它知道b->rows == a->rowsi < a->rows通过循环条件。

If this ends up being the case, you could achieve the same without assertions, and without any runtime branch, by adding (GNU C feature): 如果最终是这种情况,你可以通过添加(GNU C特性)来实现相同的无断言,并且没有任何运行时分支:

if (a->rows != b->rows || a->cols != b->cols)
    __builtin_unreachable();

A more portable but less reliable way to do this is just write some nonsensical undefined behavior like 1/0; 一个更便携但不太可靠的方法是写一些无意义的未定义行为,如1/0; in the if body. if体内。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM