When I compile the followning code on a regularly updated Ubuntu 16.04 64bit using gcc by
gcc source.c -O3 --fast-math
the executable file takes about 45 seconds of CPU time to run. But on the same machine and in Windows 7 64bit, using Visual Studio 2012 in release mode, it takes less than 10 seconds of CPU time to run. What is the main cause of this difference? Haven't I used enough optimization options of gcc? Is Visual Studio's compiler a better one? Or something else?
#include <stdio.h>
#include <math.h>
#include <time.h>
#define Nx 1000
int main()
{
double d = 0.015e-2; // meter
double V0 = 400; // volt
double De = 1800e-4; // m^2 per sec
double mu_e = 2.9e1 / 760; // m^2 per volt sec
double n0 = 1e19; // per m^3
double e_eps = 1.602e-19 / 8.854e-12;
double ne[Nx], je[Nx], E[Nx];
double dx = d / (Nx - 1);
double dt = 1e-14; // s
const int Nt = 500000;
int i, k;
double sum;
FILE *fp_ne, *fp_E;
double alpha, exp_alpha, R;
int ESign = -1;
clock_t start_t, end_t;
start_t = clock();
// initialization
for (i = 1; i < Nx; i++)
ne[i] = n0;
ne[0] = 1e-4 * n0;
for (i = 0; i < Nx; i++)
E[i] = -V0 / d;
// time loop
for (k = 0; k < Nt; k++)
{
if (k%1000==0) printf("k = %d\n", k);
for (i = 0; i < (Nx-1); i++)
{
alpha = mu_e*dx*E[i]/De;
exp_alpha = exp(alpha);
R = (exp_alpha-1)/alpha;
je[i] = (De/(dx*R))*(ne[i]-exp_alpha*ne[i+1]);
}
for (i = 1; i < (Nx - 1); i++)
ne[i] += -dt/dx*(je[i] - je[i-1]);
ne[Nx - 1] = ne[Nx - 2];
sum = 0;
for (i = 0; i < (Nx - 1); i++)
sum += dx*je[i];
for (i = 0; i < (Nx - 1); i++)
{
E[i] += -dt*e_eps*(sum / d - je[i]);
if (E[i]>=0) ESign=+1;
}
if (ESign==1) break;
}
// output
printf("time=%e\n",k*dt);
fp_ne = fopen("ne.txt", "w");
fp_E = fopen("E.txt", "w");
fprintf(fp_ne, "# x (cm)\tne(per cm^3)\n");
fprintf(fp_E, "# x (cm)\tE(V/cm)\n");
for (i = 0; i < Nx; i++)
fprintf(fp_ne, "%f\t%e\n", i*dx*100,ne[i]/1e6);
for (i = 0; i < Nx-1; i++)
fprintf(fp_E, "%f\t%e\n", i*dx*100, fabs(E[i])/1e2);
fclose(fp_ne);
fclose(fp_E);
end_t = clock();
printf("CPU time = %f\n", (double)(end_t - start_t) / CLOCKS_PER_SEC);
}
First thing I did was comment out the in-loop I/O.
//if (k%1000==0) printf("k = %d\n", k);
I obtained the below timings with only that change. The fprintf
calls at the end do influence the timings significantly, but not their relative differences, so I'm not going to measure all of these again.
I got these timings on my Arch Linux first-gen Core i5 (all compiled with the standard -O2
):
GCC 7.1:
CPU time = 23.459520
Clang 4.0.1:
CPU time = 22.936315
Intel 17.0.4:
CPU time = 7.830828
On my Qemu/libvirt virtual machine of Windows 10 on that same machine I get these timings:
MinGW-w64 GCC 6.3:
CPU time = 76.122000
VS 2015.3:
CPU time = 13.497000
VS 2017:
CPU time = 49.306000
On WINE (native Linux, but Win32 API emulation, should still be comparable to native Linux code execution)
MinGW-w64 GCC 6.3:
CPU time = 56.074000
VS 2015.3:
CPU time = 12.048000
VS 2017:
CPU time = 34.541000
Long story short: it seems like these output the best code for this particular problem:
Looking at the assembly will be the only way to get to the bottom of this, but properly analysing that is beyond me.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.