简体   繁体   中英

Why does writing to file increase cache-misses and branch-misses so much?

Following is the code I am profiling:

#include <iostream>
#include <fstream>

#define N 10000

using namespace std;

int main()
{
    ofstream fout;
    fout.open("log.txt");

    int A[N], B[N], C[N];

    for(int i=0; i<N; i++)
    {
        A[i] = B[i] = i;
    }   

    int sum = 0;

    for(int j=0; j<N; j++)
    {
        C[j] = A[j]+B[j];
        //fout<<C[j]<<endl;
        sum += C[j];
        sum %= 103;
    }

    cout<<sum<<endl;

    return 0;
}

Following is the profiling command:

perf stat -e instructions:u -e instructions:k -e cache-misses -e page-faults -e branch-misses ./test

Output is:

Performance counter stats for './test':

 15,60,186      instructions:u           
  8,35,753      instructions:k           
    24,345      cache-misses                                                
       123      page-faults                                                 
    13,051      branch-misses                                               

 0.001327182 seconds time elapsed

However, when I uncomment that single commented line, I get the following output:

Performance counter stats for './test':

         75,72,868      instructions:u           
      12,29,31,625      instructions:k           
          2,18,333      cache-misses                                                
               121      page-faults                                                 
            73,662      branch-misses                                               

       0.525844017 seconds time elapsed

I am not able to understand what is causing such a huge increase in cache-misses and moderately high increase in branch-misses. Any insights would be appreciated!

Without the " fout<<C[j]<<endl; " line your program is mostly running in user space (I'd rather say, the significant part of your program is entirely running in user space). By uncommenting that line (which is inside a loop) you introduce a lot of additional system calls (this is shown by huge increase of the instructions:k number reported by the profiler). System calls are expensive since they involve a context switch which, depending on the hardware architecture and the OS, may invalidate a noticeable part of the CPU cache.

Note that the main culprit here is endl (which forces flushing the buffers and thus triggers a system call). Replace it with '\\n' and the impact on the performance should be much less.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM