Performance penalty for large C++ dll's with autogenerated C code

Question

I am working on a piece of software that needs to call a family of optimisation solvers. Each solver is an auto-generated piece of C code, with thousands of lines of code. I am using 200 of these solvers, differing only in the size of optimisation problem to be solved.

All-in-all, these auto-generated solvers come to about 180MB of C code, which I compile to C++ using the extern "C"{ /*200 solvers' headers*/ } syntax, in Visual Studio 2008. Compiling all of this is very slow (with the "maximum speed /O2" optimisation flag, it takes about 8hours). For this reason I thought it would be a good idea to compile the solvers into a single DLL, which I can then call from a separate piece of software (which would have a reasonable compile time, and allow me to abstract away all this extern "C" stuff from higher-level code). The compiled DLL is then about 37MB.

The problem is that when executing one of these solvers using the DLL, execution requires about 30ms. If I were to compile only that single one solvers into a DLL, and call that from the same program, execution is about 100x faster (<1ms). Why is this? Can I get around it?

The DLL looks as below. Each solver uses the same structures (ie they have the same member variables), but they have different names, hence all the type casting.

extern "C"{
#include "../Generated/include/optim_001.h"
#include "../Generated/include/optim_002.h"
/*etc.*/
#include "../Generated/include/optim_200.h"
}

namespace InterceptionTrajectorySolver
{

__declspec(dllexport) InterceptionTrajectoryExitFlag SolveIntercept(unsigned numSteps, InputParams params, double* optimSoln, OutputInfo* infoOut)
{
  int exitFlag;

  switch(numSteps)
  {
  case   1:
    exitFlag = optim_001_solve((optim_001_params*) &params, (optim_001_output*) optimSoln, (optim_001_info*) &infoOut);
    break;
  case   2:
    exitFlag = optim_002_solve((optim_002_params*) &params, (optim_002_output*) optimSoln, (optim_002_info*) &infoOut);
    break;
  /*
    ...
    etc.
    ...
  */
  case   200:
    exitFlag = optim_200_solve((optim_200_params*) &params, (optim_200_output*) optimSoln, (optim_200_info*) &infoOut);
    break;
  }

  return exitFlag;
};

};

Answer 1

I do not know if your code is inlined into each case part in the example. If your functions are inline functions and you are putting it all inside one function then it will be much slower because the code is laid out in virtual memory, which will require much jumping around for the CPU as the code is executed. If it is not all inlined then perhaps these suggestions might help.

Your solution might be improved by...

A) 1) Divide the project into 200 separate dlls. Then build with a .bat file or similar. 2) Make the export function in each dll called "MyEntryPoint", and then use dynamic linking to load in the libraries as they are needed. This will then be the equivalent of a busy music program with a lot of small dll plugins loaded. Take a function pointer to the EntryPoint with GetProcAddress.

Or...

B) Build each solution as a separate .lib file. This will then compile very quickly per solution and you can then link them all together. Build an array of function pointers to all the functions and call it via lookup instead.

result = SolveInterceptWhichStep;

Combine all the libs into one big lib should not take eight hours. If it takes that long then you are doing something very wrong.

AND...

Try putting the code into different actual .cpp files. Perhaps that specific compiler will do a better job if they are all in different units etc... Then once each unit has been compiled it will stay compiled if you do not change anything.

Answer 2

Make sure that you measure and average the timing multiple calls to the optimizer, because it could be that there's a large overhead to the setup before the first call.

Then also check what that 200-branch conditional statement (your switch) is doing to your performance! Try eliminating that switch for testing, calling just one solver in your test project but linking all of them in the DLL. Do you still see slow performance?

Answer 3

I assume the reason you are generating the code is for better run-time performance, and also for better correctness. I do the same thing.

I suggest you try this technique to find out what the run-time performance problem is.

If you're seeing a 100:1 performance difference, that means each time you interrupt it and look at the program's state, there is a 99% chance you will see what the problem is.

As far as build time goes, sure it makes sense to modularize it. None of that should have much effect on run time, unless it means you're doing crazy I/O.

Performance penalty for large C++ dll's with autogenerated C code

Question

3 answers

solution1
1 ACCPTED

solution2
0 2012-09-07 08:17:27

solution3
0 2012-09-07 11:59:31

Performance penalty for large C++ dll's with autogenerated C code

Question

3 answers

solution1 1 ACCPTED

solution2 0 2012-09-07 08:17:27

solution3 0 2012-09-07 11:59:31

solution1
1 ACCPTED

solution2
0 2012-09-07 08:17:27

solution3
0 2012-09-07 11:59:31