I am programming a physics simulation with few particles (typically 3, no more than 5).
In a condensed version my code structure like this:
#include<iostream>
class Particle{
double x; // coordinate
double m; // mass
};
void performStep(Particle &p, double &F_external){
p.x += -0.2*p.x + F_external/p.m; // boiled down, in reality complex calculation, not important here
}
int main(){
dt = 0.001; // time step, not important
Particle p1;
p1.x = 5; // some random number for initialization, in reality more complex but not important here
p.m = 1;
Particle p2;
p2.x = -1; // some random numbersfor initialization, in reality more complex but not important here
p.m = 2;
Particle p3;
p3.x = 0; // some random number for initialization, in reality more complex but not important here
p.m = 3;
double F_external = 0; // external forces
for(unsigned long long int i=0; i < 10000000000; ++i){ // many steps, typically 10e9
F_external = sin(i*dt);
performStep(p1, F_external);
performStep(p2, F_external);
performStep(p3, F_external);
}
std::cout << "p1.x: " << p1.x << std::endl;
std::cout << "p2.x: " << p2.x << std::endl;
std::cout << "p3.x: " << p3.x << std::endl;
}
I have determined with clock()
that the performStep(p, F_external)
call is the bottleneck in my code). When I tried to do inline calculation, ie replace performStep(p1, F_external)
by p1.x += -0.2*p1.x + F_external/p1.m;
the calculation suddenly was roughly a factor of 2 faster. Note that performStep() in reality is about ~60 basic arithmetic calculations over ~20 lines, so the code becomes really bloated if I just inline it for every particle.
Why is that the case? I am compiling with MinGW64/g++ and the -O2 flag. I thought the compiler would optimize such things?
Edit:
Here is the function that is called. Note that in reality, I calculate all three coordinates x,y,z with a couple of different external forces. Variables which are not passed via the function are a member of SimulationRun
. The algorithm is a fourth-order leapfrog algorithm.
void SimulationRun::performLeapfrog_z(const unsigned long long int& i, const double& x, const double& y, double& z, const double& vx, const double& vy, double& vz, const double& qC2U0,
const double& U0, const double& m, const double& C4, const double& B2, const double& f_minus, const double& f_z, const double& f_plus, const bool& bool_calculate_xy,
const double& Find, const double& Fheating) {
// probing for C4 == 0 and B2 == 0 saves some computation time
if (C4 == 0) {
Fz_C4_Be = 0;
}
if (B2 == 0 || !bool_calculate_xy) {
Fz_B2_Be = 0;
}
z1 = z + c1 * vz * dt;
if (C4 != 0 && !bool_calculate_xy) {
Fz_C4_Be = (-4) * q * C4 * U0 * z1 * z1 * z1;
}
else if (C4 != 0 && bool_calculate_xy) {
Fz_C4_Be = q * C4 * U0 * (-4 * z1 * z1 * z1 + 6 * z1 * (x * x + y * y));
}
if (B2 != 0 && bool_calculate_xy) {
Fz_B2_Be = q * B2 * (-vx * z1 * y + vy * z1 * x);
}
acc_z1 = (qC2U0 * (-2) * z1 + Find + Fz_C4_Be + Fz_B2_Be + Fheating) / m;
vz1 = vz + d1 * acc_z1 * dt;
z2 = z1 + c2 * vz1 * dt;
if (C4 != 0 && !bool_calculate_xy) {
Fz_C4_Be = (-4) * q * C4 * U0 * z2 * z2 * z2;
}
else if (C4 != 0 && bool_calculate_xy) {
Fz_C4_Be = q * C4 * U0 * (-4 * z2 * z2 * z2 + 6 * z2 * (x * x + y * y));
}
if (B2 != 0 && bool_calculate_xy) {
Fz_B2_Be = q * B2 * (-vx * z2 * y + vy * z2 * x);
}
acc_z2 = (qC2U0 * (-2) * z2 + +Find + Fz_C4_Be + Fz_B2_Be + Fheating) / m;
vz2 = vz1 + d2 * acc_z2 * dt;
z3 = z2 + c3 * vz2 * dt;
if (C4 != 0 && !bool_calculate_xy) {
Fz_C4_Be = (-4) * q * C4 * U0 * z3 * z3 * z3;
}
else if (C4 != 0 && bool_calculate_xy) {
Fz_C4_Be = q * C4 * U0 * (-4 * z3 * z3 * z3 + 6 * z3 * (x * x + y * y));
}
if (B2 != 0 && bool_calculate_xy) {
Fz_B2_Be = q * B2 * (-vx * z3 * y + vy * z3 * x);
}
acc_z3 = (qC2U0 * (-2) * z3 + Find + Fz_C4_Be + Fz_B2_Be + Fheating) / m;
vz3 = vz2 + d3 * acc_z3 * dt;
z = z3 + c4 * vz3 * dt;
vz = vz3;
}
Optimization is hard, even for compilers. Here are some optimization tips:
performStep
is hotspot, put it into a header file(in case that you split declaration and definition into header/source), then add inline
keyword, like:// at file xxx.h
inline void performStep(Particle &p, double F_external){
p.x += -0.2*p.x + F_external/p.m; // boiled down, in reality complex calculation, not important here
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.