Following is the code I am profiling:
#include <iostream>
#include <fstream>
#define N 10000
using namespace std;
int main()
{
ofstream fout;
fout.open("log.txt");
int A[N], B[N], C[N];
for(int i=0; i<N; i++)
{
A[i] = B[i] = i;
}
int sum = 0;
for(int j=0; j<N; j++)
{
C[j] = A[j]+B[j];
//fout<<C[j]<<endl;
sum += C[j];
sum %= 103;
}
cout<<sum<<endl;
return 0;
}
Following is the profiling command:
perf stat -e instructions:u -e instructions:k -e cache-misses -e page-faults -e branch-misses ./test
Output is:
Performance counter stats for './test':
15,60,186 instructions:u
8,35,753 instructions:k
24,345 cache-misses
123 page-faults
13,051 branch-misses
0.001327182 seconds time elapsed
However, when I uncomment that single commented line, I get the following output:
Performance counter stats for './test':
75,72,868 instructions:u
12,29,31,625 instructions:k
2,18,333 cache-misses
121 page-faults
73,662 branch-misses
0.525844017 seconds time elapsed
I am not able to understand what is causing such a huge increase in cache-misses and moderately high increase in branch-misses. Any insights would be appreciated!
Without the " fout<<C[j]<<endl;
" line your program is mostly running in user space (I'd rather say, the significant part of your program is entirely running in user space). By uncommenting that line (which is inside a loop) you introduce a lot of additional system calls (this is shown by huge increase of the instructions:k
number reported by the profiler). System calls are expensive since they involve a context switch which, depending on the hardware architecture and the OS, may invalidate a noticeable part of the CPU cache.
Note that the main culprit here is endl
(which forces flushing the buffers and thus triggers a system call). Replace it with '\\n'
and the impact on the performance should be much less.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.