简体   繁体   中英

Sieve of eratosthenes c++ code speeds up in consecutive runs - why?

I had a quick check around for this question, but couldn't find an answer - although I guess it might have been brought up here before.

I was messing around writing a simple implementation of sieve of eratosthenes in c++ and timing the outcome:

#include <iostream> 
#include <math.h>


int main() {

  int n = 100000;
  int seive [n];

  for (int i=0; i<n; i++) {
    seive[i] = i;
  }

  for (int i=2; i < ceil(sqrt(n)); i++) {
    for (int j=i*2; j<=n; j+=i) {
      seive[j-1] = -9;
    }
  }

  for (int i=0; i<n; i++) {
    if (seive[i] != -9) {
      std::cout << i+1 << "\n";
    }
  }

  return 0;
}

I compile it using:

g++ seive.cpp -o seiveCpp

And then time it using:

time ./seiveCpp

First time:

./seiveCpp  0.01s user 0.01s system 10% cpu 0.184 total

Second time:

./seiveCpp  0.01s user 0.01s system 58% cpu 0.034 total

Third time:

./seiveCpp  0.01s user 0.01s system 59% cpu 0.037 total

etc.

If I repeat this multiple times, it seems like running the code is always around 5x slower the first time than all the successive times.

What is the reason behind this happening?

I am running this on a 2017 MacBook Pro, 2.3 GHz Dual-Core Intel Core i5, and compiling with Apple clang version 11.0.0 (clang-1100.0.33.12

The reason is because of the branch predictor. While running the first time computer doesn't know anything about the program, but while executing it is finding the logic in the jumps in your code (for and if) and then can better predict which branch it should take. In modern processors, there are long pipelines of commands, so correct predicting of the jump can significantly decrease the time of work.

So to compare a few algorithms by the execution time, it is good practice to run a hundred of times and take the smallest time.

Given the very large difference, I would guess that the CPU is in a lower performance mode when you start the first run, but then under load from the first run the OS switches it into a higher performance mode, which you observe as lowered execution time.

Make sure your notebook is connected to AC power and that all power-saving options are disabled if you want to avoid the effect.

In any case there will still be caching effects left (eg the contents in the executable might be cached in memory). But these shouldn't be on the order of 100ms, I think.

In general when you benchmark code, you should always do a warmup runs, because there will always be such effects to some degree for one reason or another. You generally want to perform the actual test runs when the environment has reached an equilibrium state, so to speak.

When running a program multiple times the first time the OS has to load the file into memory, the next time it is likely to already be present (although relocations may still be necessary depending on compiler/linker settings, namely whether position-independant-code is generated). The branch location answer would be much more likely to apply if you were running the same code many times within a single process (which is a good idea when gathering performance data - put the timing code in your program and run the code of interest multiple times, timing each loop rather than running your entire program multiple times and using an external time program).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM