简体   繁体   中英

-DNDEBUG with -Ofast is slower than only -Ofast

I'm in the process of optimizing a simple genetic algorithm and neural network and I'm fiddling with some options in GCC to generate faster executables.

In my code I have some assertions, such as

mat mat_add(mat a, mat b)
{
    assert(a->rows == b->rows);
    assert(a->cols == b->cols);
    mat m = mat_create(a->rows, a->cols);
    for(size_t i = 0; i < a->rows; i++) {
        for(size_t j = 0; j < a->cols; j++)
            mat_set(m, i, j, mat_get(a, i, j) + mat_get(b, i, j));
    }
    return m;
}

I've figured that if I added -DNDEBUG to disable the assertions, the executable would be faster because it wouldn't check the conditions above. However, it is actually slower.

Without -DNDEBUG :

$ gcc src/*.c -lm -pthread -Iinclude/ -Wall -Ofast
$ for i in $(seq 1 5); do time ./a.out; done

real    0m11.677s
user    1m28.786s
sys     0m0.729s

real    0m11.716s
user    1m29.304s
sys     0m0.723s

real    0m12.217s
user    1m31.707s
sys     0m0.806s

real    0m12.602s
user    1m32.863s
sys     0m0.726s

real    0m12.225s
user    1m30.915s
sys     0m0.736s

With -DNDEBUG :

$ gcc src/*.c -lm -pthread -Iinclude/ -Wall -Ofast -DNDEBUG
$ for i in $(seq 1 5); do time ./a.out; done

real    0m13.698s
user    1m42.533s
sys     0m0.792s

real    0m13.764s
user    1m43.337s
sys     0m0.709s

real    0m13.655s
user    1m42.986s
sys     0m0.739s

real    0m13.836s
user    1m43.138s
sys     0m0.719s

real    0m14.072s
user    1m43.879s
sys     0m0.712s

It isn't much slower but it is noticeable.

What could be causing this slowdown?

Do the mat_set and mat_get functions perform their own bounds checks on the indices? With the asserts present, the loop is only reachable if b->rows == a->rows is true. That allows the compiler to optimize out any check i < b->rows in the mat_get for b , because it knows b->rows == a->rows and i < a->rows by the loop condition.

If this ends up being the case, you could achieve the same without assertions, and without any runtime branch, by adding (GNU C feature):

if (a->rows != b->rows || a->cols != b->cols)
    __builtin_unreachable();

A more portable but less reliable way to do this is just write some nonsensical undefined behavior like 1/0; in the if body.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM