Speed difference when using different values with modulo

Question

I'm using gcc 4.6.3 and creating a large array of random shorts. I'm generating them with the following statements:

val = SHRT_MAX; //as defined by limits.h
while(array<end) {
    *array++ = rand() % val;
}

This is a considerably fast operation, and even for arrays as large as 5,000,000 elements is completed almost instantly. I was curious about my sorts efficiency with a smaller variation in numbers and changed that to:

val = 3;

This caused a considerable speed difference, it ran much slower than the original statements. What is it that is causing such a considerable speed difference?

Answer 1

SHRT_MAX is most likely greater than or equal to RAND_MAX . The statement:

*array++ = rand() % val;

can be optimized into:

int rand_value= rand();
if (rand_value==RAND_MAX) rand_value= 0;
*array++= rand_value;

which is faster because it replaces a modulus with a branch. The second version, where val is 3, cannot be optimized into a simpler version that runs without modulus.

% SHRT_MAX cannot be simplified into a bitwise operation. But combined with knowledge of how rand() is specified, the compiler can certainly optimize statements dealing with rand() and values greater than or equal to RAND_MAX .

Answer 2

Compilers can optimize calculation of modulo (a%B), where B is a constant. It replaces actual modulo with simpler arithmetic operations. The details are explained in topics like Most optimized way to calculate modulus in C . However such optimizations are faster for some values of B than for others.

Even CPU division/modulo instruction can take different number of cycles complete (at least on some CPUs). See numbers for x86 here: http://gmplib.org/~tege/x86-timing.pdf .

Answer 3

SHRT_MAX is a 2^n-1 value, which can be optimised for divide. Dividing by 3 is much tougher, so the compiler may well decide to divide by 3 (or do some other magic operation that is slower than the 2^n-1 variant.

The fastest modulo you can use is for 2^n , which can be replaced with a single and-instruciton, for positive values: x % 256 is the same as x & 255 . Unfortunately, when the value may be negative, it's not quite so easy...

Speed difference when using different values with modulo

Question

3 answers

solution1
3 ACCPTED 2013-01-24 23:04:05

solution2
2 2013-01-24 22:52:48

solution3
0 2013-01-25 00:12:41

Speed difference when using different values with modulo

Question

3 answers

solution1 3 ACCPTED 2013-01-24 23:04:05

solution2 2 2013-01-24 22:52:48

solution3 0 2013-01-25 00:12:41

solution1
3 ACCPTED 2013-01-24 23:04:05

solution2
2 2013-01-24 22:52:48

solution3
0 2013-01-25 00:12:41