简体   繁体   中英

Why is atan first call much slower than the next ones?

The following code demonstrates that atan computation time can vary a lot:

#include <cstdio>
#include <cstdlib>
#include <cmath>

#include <sys/time.h>
#include <sys/resource.h>
#include <unistd.h>

double get_time()
{
    struct timeval t;
    struct timezone tzp;
    gettimeofday(&t, &tzp);
    return t.tv_sec + t.tv_usec*1e-6;
}

int main() {
    double worst_time = 0.0;
    double best_time = 1e6;

    volatile double x0 = -M_PI/2.0;
    volatile double foo = atan(x0); // SLOW CALL HERE
    volatile double sum = 0.0; // volatile to avoid having tan() call optimized away
    for (double x = x0; x < M_PI/3.0; x += 0.1) {
        volatile double y = x;
        const double start = get_time();
        asm volatile ("":::"memory"); // avoid reordering in -O3
        const double value = atan(y);
        asm volatile ("":::"memory"); // avoid reordering
        const double end = get_time();
        sum += value;

        const double delta = end - start;
        if (delta > worst_time) {
            worst_time = delta;
        }
        if (delta < best_time) {
            best_time = delta;
        }
        printf("* %f (value: %f)\n", delta, y);
    }

    printf("%f / %f\n", worst_time, best_time);

    printf("%f\n", foo);
}

From my machine worst time is around 15us whereas the best time is 0 (too small to be measured).

The average time (not displayed here) on my machine is around 1 or 2 us.

I tried different compilation flags (-O3, linking statically to libm, etc.) but I cannot find what causes the worst time to be much slower. Any idea?

edit: I am using Ubuntu 14.04 - gcc 4.8.4

edit2: replace atan2 by atan. I am not interested by the fact that atan2 is defined piece-wise and different branches may take different times. I am interested in eliminating the outliers which can appear even if atan is called instead of atan2.

edit3:

* 0.000015 (value: -1.570796)
* 0.000000 (value: -1.470796)
* 0.000001 (value: -1.370796)
* 0.000001 (value: -1.270796)
* 0.000000 (value: -1.170796)
* 0.000002 (value: -1.070796)
* 0.000000 (value: -0.970796)
* 0.000001 (value: -0.870796)
* 0.000000 (value: -0.770796)
* 0.000000 (value: -0.670796)
* 0.000001 (value: -0.570796)
* 0.000000 (value: -0.470796)
* 0.000003 (value: -0.370796)
* 0.000001 (value: -0.270796)
* 0.000000 (value: -0.170796)
* 0.000000 (value: -0.070796)
* 0.000001 (value: 0.029204)
* 0.000000 (value: 0.129204)
* 0.000002 (value: 0.229204)
* 0.000001 (value: 0.329204)
* 0.000000 (value: 0.429204)
* 0.000001 (value: 0.529204)
* 0.000001 (value: 0.629204)
* 0.000001 (value: 0.729204)
* 0.000001 (value: 0.829204)
* 0.000001 (value: 0.929204)
* 0.000000 (value: 1.029204)
0.000015 / 0.000000 / 0.000001

edit4:

It appears that the first call is the culprit! The call outside the loop was optimized away by the compiler, if we force atan to be evaluated outside of the loop for x0 , all the calls are reasonably fast...

* 0.000000 (value: -1.570796)
* 0.000001 (value: -1.470796)
* 0.000000 (value: -1.370796)
* 0.000002 (value: -1.270796)
* 0.000001 (value: -1.170796)
* 0.000001 (value: -1.070796)
* 0.000000 (value: -0.970796)
* 0.000000 (value: -0.870796)
* 0.000000 (value: -0.770796)
* 0.000001 (value: -0.670796)
* 0.000000 (value: -0.570796)
* 0.000000 (value: -0.470796)
* 0.000006 (value: -0.370796)
* 0.000001 (value: -0.270796)
* 0.000002 (value: -0.170796)
* 0.000001 (value: -0.070796)
* 0.000000 (value: 0.029204)
* 0.000001 (value: 0.129204)
* 0.000003 (value: 0.229204)
* 0.000000 (value: 0.329204)
* 0.000000 (value: 0.429204)
* 0.000000 (value: 0.529204)
* 0.000001 (value: 0.629204)
* 0.000000 (value: 0.729204)
* 0.000000 (value: 0.829204)
* 0.000000 (value: 0.929204)
* 0.000000 (value: 1.029204)
0.000006 / 0.000000

https://ideone.com/vtUuE6

The timing difference is actually caused by pages faults (!). The first time the function is called, the page containing atan2 code is accessed and a page fault occurs. Using mlockall() should improve the situation.

atan2 is a piecewise function, ie for certain values/ranges of values it performs different operations, some of which are just returning a constant value, which is quite fast, but others involve actual trigonometric calculations, which can take quite some time. If you want the particulars they are available at https://en.wikipedia.org/wiki/Atan2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM