简体   繁体   中英

Compiling with g++ using multiple cores

快速问题:允许 g++ 生成自身的多个实例以更快地编译大型项目的编译器标志是什么(例如,多核 CPU 一次编译 4 个源文件)?

You can do this with make - with gnu make it is the -j flag (this will also help on a uniprocessor machine).

For example if you want 4 parallel jobs from make:

make -j 4

You can also run gcc in a pipe with

gcc -pipe

This will pipeline the compile stages, which will also help keep the cores busy.

If you have additional machines available too, you might check out distcc , which will farm compiles out to those as well.

There is no such flag, and having one runs against the Unix philosophy of having each tool perform just one function and perform it well. Spawning compiler processes is conceptually the job of the build system. What you are probably looking for is the -j (jobs) flag to GNU make, a la

make -j4

Or you can use pmake or similar parallel make systems.

People have mentioned make but bjam also supports a similar concept. Using bjam -jx instructs bjam to build up to x concurrent commands.

We use the same build scripts on Windows and Linux and using this option halves our build times on both platforms. Nice.

make will do this for you. Investigate the -j and -l switches in the man page. I don't think g++ is parallelizable.

If using make, issue with -j . From man make :

 -j [jobs], --jobs[=jobs] Specifies the number of jobs (commands) to run simultaneously. If there is more than one -j option, the last one is effective. If the -j option is given without an argument, make will not limit the number of jobs that can run simultaneously.

And most notably, if you want to script or identify the number of cores you have available (depending on your environment, and if you run in many environments, this can change a lot) you may use ubiquitous Python function cpu_count() :

https://docs.python.org/3/library/multiprocessing.html#multiprocessing.cpu_count

Like this:

make -j $(python3 -c 'import multiprocessing as mp; print(int(mp.cpu_count() * 1.5))')

If you're asking why 1.5 I'll quote user artless-noise in a comment above:

The 1.5 number is because of the noted I/O bound problem. It is a rule of thumb. About 1/3 of the jobs will be waiting for I/O, so the remaining jobs will be using the available cores. A number greater than the cores is better and you could even go as high as 2x.

distcc 还可用于分发编译,不仅在当前机器上,而且在场中安装了 distcc 的其他机器上。

我不确定 g++,但如果你使用 GNU Make,那么“make -j N”(其中 N 是 make 可以创建的线程数)将允许 make 同时运行多个 g++ 作业(这么长时间因为文件不相互依赖)。

GNU parallel

I was making a synthetic compilation benchmark and couldn't be bothered to write a Makefile, so I used:

sudo apt-get install parallel
ls | grep -E '\.c$' | parallel -t --will-cite "gcc -c -o '{.}.o' '{}'"

Explanation:

  • {.} takes the input argument and removes its extension
  • -t prints out the commands being run to give us an idea of progress
  • --will-cite removes the request to cite the software if you publish results using it...

parallel is so convenient that I could even do a timestamp check myself:

ls | grep -E '\.c$' | parallel -t --will-cite "\
  if ! [ -f '{.}.o' ] || [ '{}' -nt '{.}.o' ]; then
    gcc -c -o '{.}.o' '{}'
  fi
"

xargs -P can also run jobs in parallel, but it is a bit less convenient to do the extension manipulation or run multiple commands with it: Calling multiple commands through xargs

Parallel linking was asked at: Can gcc use multiple cores when linking?

TODO: I think I read somewhere that compilation can be reduced to matrix multiplication, so maybe it is also possible to speed up single file compilation for large files. But I can't find a reference now.

Tested in Ubuntu 18.10.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM