简体   繁体   中英

How much do forward declarations affect compile time?

I am very interested in some studies or empirical data that shows a comparison of compilation times between two c++ projects that are the same except one uses forward declarations where possible and the other uses none.

How drastically can forward declarations change compilation time as compared to full includes?

#include "myClass.h"

vs.

class myClass;

Are there any studies that examine this?

I realize that this is a vague question that greatly depends on the project. I don't expect a hard number for an answer. Rather, I'm hoping someone may be able to direct me to a study about this.

The project I'm specifically worried about has about 1200 files. Each cpp on average has 5 headers included. Each header has on average 5 headers included. This regresses about 4 levels deep. It would seem that for each cpp compiled, around 300 headers must be opened and parsed, some many times. (There are many duplicates in the include tree.) There are guards, but the files are still opened. Each cpp is separately compiled with gcc, so there's no header caching.

To be sure no one misunderstands, I certainly advocate using forward declarations where possible. My employer, however, has banned them. I'm trying to argue against that position.

Thank you for any information.

Forward declarations can make for neater more understandable code which HAS to be the goal of any decision surely.

Couple that with the fact that when it comes to classes its quite possible for 2 classes to rely upon each other which makes it a bit hard to NOT use forward declaration without causing a nightmare.

Equally forward declaration of classes in a header means that you only need to include the relevant headers in the CPPs that actually USE those classes. That actually DECREASES compile time.

Edit : Given your comment above I would point out it is ALWAYS slower to include a header file than to forward declare. Any time you include a header you are necessitating a load from disk often only to find out that the header guards mean that nothing happens. That would waste immense amounts of time and is really a VERY stupid rule to be bringing in.

Edit 2 : Hard data is pretty hard to obtain. Anecdotally, I once worked on a project that wasn't strict about its header includes and the build time was roughly 45 minute on a 512MB RAM P3-500Mhz (This was a while back). After spending 2 weeks cutting down the include nightmare (By using forward declarations) I had managed to get the code to build in a little under 4 minutes. Subsequently using forward declarations became a rule whenever possible.

Edit 3 : Its also worth bearing in mind that there is a huge advantage from using forward declarations when it comes to making small modifications to your code. If headers are included all over the shop then a modification to a header file can cause vast amounts of files to be rebuilt.

I also note lots of other people extolling the virtues of pre-compiled headers (PCHs). They have their place and they can really help but they really shouldn't be used as an alternative to proper forward declaration. Otherwise modifications to header files can cause issues with recompilation of lots of files (as mentioned above) as well as triggering a PCH rebuild. PCHs can provide a big win for things like libraries that are pre-built but they are no reason not to use proper forward declarations.

Have a look in John Lakos's excellent Large Scale C++ Design book -- I think he has some figures for forward declaration by looking at what happens if you include N headers M levels deep.

If you don't use forward declarations, then aside from increasing the total build time from a clean source tree, it also vastly increases the incremental build time because header files are being included unnecessarily. Say you have 4 classes, A, B, C and D. C uses A and B in its implementation (ie in C.cpp ) and D uses C in its implementation. The interface of D is forced to include Ch because of this 'no forward declaration' rule. Similarly Ch is forced to include Ah and Bh, so whenever A or B is changed, D.cpp has to be rebuilt even though it has no direct dependency. As the project scales up this means that if you touch any header it'll have a massive effect on causing huge amounts of code to be rebuilt that just doesn't need to be.

To have a rule that disallows forward declaration is (in my book) very bad practice indeed. It's going to waste huge amounts of time for the developers for no gain. The general rule of thumb should be that if the interface of class B depends on class A then it should include Ah, otherwise forward declare it. In practice 'depends on' means inherits from, uses as a member variable or 'uses any methods of'. The Pimpl idiom is a widespread and well understood method for hiding the implementation from the interface and allows you to vastly reduce the amount of rebuilding needed in your codebase.

If you can't find the figures from Lakos then I would suggest creating your own experiments and taking timings to prove to your management that this rule is absolutely wrong-headed.

#include "myClass.h"

is 1..n lines

class myClass;

is 1 line.

You will save time unless all your headers are 1 liners. As there is no impact on the compilation itself (forward reference is just way to say to the compiler that a specific symbol will be defined at link time, and will be possible only if the compiler doesnt need data from that symbol (data size for example)), the reading time of the files included will be saved everytime you replace one by forward references. There's not a regular measure for this as it is a per project value, but it is a recommended practice for large c++ projects (See Large-Scale C++ Software Design / John Lakos for more info about tricks to manage large projects in c++ even if some of them are dated)

Another way to limit the time passed by the compiler on headers is pre-compiled headers.

I made a small demo which generates artificial codebase and tests this hypothesis. It generates 200 headers. Each header has a struct with 100 fields and a comment 5000 bytes long. 500 .c files are used for benchmarking, each includes all the header files or forward declares all the classes. To make it more realistic, each header is also included into it's own .c file

The result is that using includes took me 22 seconds to compile while using forward declarations took 9 seconds .

generate.py

#!/usr/bin/env python3

import random
import string

include_template = """#ifndef FILE_{0}_{1}
#define FILE_{0}_{1}

{2}
//{3}

struct c_{0}_{1} {{
{4}}};

#endif
"""

def write_file(name, content):
    f = open("./src/" + name, "w")
    f.write(content)
    f.close()

GROUPS = 200
FILES_PER_GROUP = 0
EXTRA_SRC_FILES = 500
COMMENT = ''.join(random.choices(string.ascii_uppercase + string.digits, k=5000))
VAR_BLOCK = "".join(["int var_{0};\n".format(k) for k in range(100)])

main_includes = ""
main_fwd = ""
for i in range(GROUPS):
    include_statements = ""
    for j in range(FILES_PER_GROUP):
        write_file("file_{0}_{1}.h".format(i,j), include_template.format(i, j, "", COMMENT, VAR_BLOCK))
        write_file("file_{0}_{1}.c".format(i,j), "#include \"file_{0}_{1}.h\"\n".format(i,j))
        include_statements += "#include \"file_{0}_{1}.h\"\n".format(i, j)
        main_includes += "#include \"file_{0}_{1}.h\"\n".format(i,j)
        main_fwd += "struct c_{0}_{1};\n".format(i,j)
    write_file("file_{0}_x.h".format(i), include_template.format(i, "x", include_statements, COMMENT, VAR_BLOCK))
    write_file("file_{0}_x.c".format(i), "#include \"file_{0}_x.h\"\n".format(i))
    main_includes += "#include \"file_{0}_x.h\"\n".format(i)
    main_fwd += "struct c_{0}_x;\n".format(i)

main_template = """
{0}

int main(void) {{ return 0; }}

"""

for i in range(EXTRA_SRC_FILES):
    write_file("extra_inc_{0}.c".format(i), main_includes)
    write_file("extra_fwd_{0}.c".format(i), main_fwd)

write_file("maininc.c", main_template.format(main_includes))
write_file("mainfwd.c", main_template.format(main_fwd))


run_test.sh

#!/bin/bash

mkdir -p src
./generate.py
ls src/ | wc -l
du -h src/
gcc -v
echo src/file_*_*.c src/extra_inc_*.c src/mainfwd.c | xargs time gcc -o fwd.out
rm -rf out/*.a
echo src/file_*_*.c src/extra_fwd_*.c src/maininc.c | xargs time gcc -o inc.out
rm -rf fwd.out inc.out src

Results

$ ./run_test.sh 
    1402
8.2M    src/
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 11.0.3 (clang-1103.0.32.29)
Target: x86_64-apple-darwin19.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
       22.32 real        13.56 user         8.27 sys
        8.51 real         4.44 user         3.78 sys

You've asked a very general question that's elicited some very good general answers. But your question wasn't about your actual problem:

To be sure no one misunderstands, I certainly advocate using forward declarations where possible. My employer, however, has banned them. I'm trying to argue against that position.

We have some information on the project, but not enough:

The project I'm specifically worried about has about 1200 files. Each cpp on average has 5 headers included. Each header has on average 5 headers included. This regresses about 4 levels deep. It would seem that for each cpp compiled, around 300 headers must be opened and parsed, some many times. (There are many duplicates in the include tree.) There are guards, but the files are still opened. Each cpp is separately compiled with gcc, so there's no header caching.

What have you done towards using gcc's precompiled headers? What difference does it make in compile times?

How long does it take to compile a clean build now? How long are your typical (non-clean/incremental) builds? If, as in James McNellis' example in comments, build times are under a minute:

The last large C++ project on which I worked was on the order of 1 million SLOC (not including third party libraries). ... We didn't use forward declarations much at all and the whole thing built in 10 minutes. Incremental rebuilds were on the order of seconds.

Then it doesn't really matter how much time would be saved by avoiding includes: shaving seconds off builds surely won't matter for many projects.

Take a small representative portion of your project and convert it to what you'd like it to be. Measure the differences in compilation time between the unconverted and the converted versions of that sample. Remember to touch (or the equivalent of make --assume-new) various sets of files to represent real builds you'd encounter while working.

Show your employer how you'd be more productive.

Uhmm, the question is so unclear. And it depends, to be simple.

In an arbitrary scenario i think translation units will not become shorter and easier to compile. The most regarded intent of forward-declarations is to provide convinience to the programmer.

For people using MS Visual Studio , check out a great plugin called Compile Score by Ramon Viladomat.

It pulls information from Clang or MSBuild (pdb) and shows how much time each file operation takes within the entire build run, separating front-end (pre-processor work) from back-end (actual code gen). You can even see which .cpp files included a specific .h and search for the low hanging fruit to speed up your builds. Lots of options and nifty features. Def. worth a try if you have large projects.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM