简体   繁体   中英

Spliting single file into multiple files in C - performance aspect

I have found a similar post on this topic but it address design aspect rather than performance so I am posting this to understand how breaking of a big c file affects compile and execution time.

I have a big utils files (all of us know quickly they grow). I am trying to understand if splitting the file into module based function files ( cookies.c, memcacheutils.c, stringutils.c, search.c, sort.c, arrayutils.c etc.) would add any penalty on compile and execution time.

My common sense says it would add some penalty as the code now has to find pointers in far fetch places rather than in the same file.

I could be horribly wrong or partially correct. Seeking guidance of all gurus. My current utils file is around 150k with 80+ functions.

Thank you for reading the post.

Generally splitting your project into multiple compilation units allows for better project management and faster partial compilation. When you edit one file you only need to recompile that compilation unit and relink in order to test & debug.

Depending on your compiler though having all in one file may allow for additional inlining and functional optimisation. All at the cost of compilation time.

You should always segment your sources into logical units.

This also has the beneift of faster compilation, because you don't need to recompile everything for every single change. Also maintaining such a source is horrible at best and keeping track of production relevant changes is also problematic.

There is no performance gain/penaltiy if a function resides in a different module, and at worst it will be a single additional jmp instruction. If your code really depends on machine cycles, then you should start considering the design of the algorithm first.

This used to matter when you had 16-bit PCs with different segments. Far (and worse, " huge ") pointers carried a performance cost as you had to start fooling around with segment registers.

Nowadays with 32-bit addressing there should be no cost. Ultimately if you're that worried about performance then you begin to consider " jump tables " in assembly which require the target address to be in a short distance relative to the current instruction.

In C, then, you really should aim to put your code in different modules (read about software "cohesion" and "coupling" theoretical issues). There should be no difference in execution time. As far as compile time goes it "depends" - especially if you are including files repeatedly. In a big project having multiple files is a massive time saver as you can recompile only that unit of code that changed. In a small project compilation time is so small to be relatively insignificant to worry about efficiency.

Compile time would change.

(Note - any system and project which can do an incremental build would get faster.)

If there are no changes to the code besides spitting into files and then the end result would not change.

If you include debug information in your code then your final code result would change with more files but I would not expect a performance difference.


Side note, I don't think there is a single programmer who has worked with large systems that would tell you not to split the file. You just have to in order to make a large system maintainable. Can't say if your system is at that point yet, but there is no harm in doing it early. Split the file.

This would not add any performance penalty. And even if it did, it is a premature optimization. The only thing that matters is development time.

If you ever find out that you've already made sure all your algorithms have optimal complexity, tweaked all inner loops for maximum performance and still need to shave off a few picoseconds off the runtime, you can always create a source file that will simply #include all the split sources to feed them to the compiler all as one big chunk.

In regards to runtime performance, I would consider running some performance measurements, depending on how sensitive you need to be in terms of performance loss. The consensus from the answers so far is that runtime performance would not be degraded by splitting the file up into smaller units, but this depends on your definition of "performance".

If you are really concerned about the slightest performance loss, unless you have whole program optimization enabled, and it is effective, there is the slight possibility that the compiler would miss some opportunities for optimization if your file is split up (depending of course on the style of the code, use of globals, use of inlining (keep in mind that in some cases, not inlining might yield better results), static classes/methods if you are using c++, etc).

I suspect that in some edge cases, having a single source file could give marginal performance improvements (and in other cases, it could degrade performance!). Testing before and after with a few simple scenarios, including varying the compiler's optimization level, would be quite an interesting experiment.

I don't think you will find any hard-and-fast rules such as "it is always okay to split a large set of related functions into two source files", but you might find that for specific compiler settings and source files, splitting up the files might even cause subtleties such affecting the performance of the instruction cache (depending on how fine-grained your performance testing is).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM