简体   繁体   English

如何在 C 中对多个内核进行编程

[英]How can I programing multiple cores in C

I have a program and want to increase the runtime performance more.我有一个程序,想进一步提高运行时性能。

let a = 1;
let b = 2;

let c = a + b;
let d = c + 2;

let e = 3;

let f = c + d;
let g = a + e;

Step 1: Because a, b and e are independent so I want to execute them in parallel (different cores).第1步:因为a、b、e是独立的所以我想并行执行它们(不同的内核)。

Step 2: Because c is depended on a and b;第二步:因为c依赖于a和b; g is depended on a and e, but c and g are independent each other so execute c and g after step 1 but in parallel. g 依赖于 a 和 e,但是 c 和 g 是相互独立的,所以在步骤 1 之后并行执行 c 和 g。

Step 3: Because d is depended on c so they are executed after step 2.第三步:因为d依赖于c所以他们在第二步之后执行。

Step 4: Because f is depended on c and d, so it is executed after step 3.第四步:因为f依赖于c和d,所以在第三步之后执行。

Can we achieve this one with C or any programing language support this natively?我们可以用 C 或任何本身支持这个的编程语言来实现这个吗?

Multi-threading is clearly not suited for your problem.多线程显然不适合您的问题。 The synchronization/data-movement time is far bigger than the time to compute an addition of two native-typed values (eg. floating-point number, integers, etc.).同步/数据移动时间远大于计算两个本机类型值(例如浮点数、整数等)相加的时间。 Indeed, adding two integers take about 1 cycle on mainstream x86-64 processors while the time to move data from one cache of a core to another one takes at least several dozens of cycles (if not hundreds regarding the target architecture).事实上,在主流 x86-64 处理器上,将两个整数相加大约需要 1 个周期,而将数据从一个内核的缓存移动到另一个内核的时间至少需要几十个周期(如果不是数百个关于目标架构的话)。 Thus, using multiple cores will actually slow down massively the code.因此,使用多核实际上会大大降低代码速度。 Multi-threading only worth it for a relatively heavy grained computation (at least few microseconds and generally even a bit more).多线程只值得用于相对繁重的粒度计算(至少几微秒,通常甚至更多)。

Fortunately, modern processors can execute multiple instructions in parallel per cycle (see Instruction-level parallelism and Superscalar processor ).幸运的是,现代处理器可以在每个周期并行执行多条指令(请参阅指令级并行性超标量处理器)。 For example, an Intel Skylake can execute 4 addition per cycle.例如,Intel Skylake 每个周期可以执行 4 个加法。 It can also execute instructions in an out-of-order way.它还可以乱序执行指令。 A processor can detect dependencies for you so you do not need to do much.处理器可以为您检测依赖关系,因此您不需要做太多事情。 You just need to ensure instructions are independent so they can be executed in parallel.您只需要确保指令是独立的,以便它们可以并行执行。

The concept you are looking for is "multi-threading".您正在寻找的概念是“多线程”。 Most modern progressing languages have built-in support for this concept (for example C++'s std::thread , C is a bit ancient but multi-threading support was added in the 2011 standard update (known as C11) using the threads.h header. The new API is optional and while many common C implementations do offer support, the one you use may not.大多数现代进步语言都内置了对此概念的支持(例如 C++ 的std::thread C 有点古老,但在 2011 年标准更新(称为 C11)中使用threads.h header 添加了多线程支持。新的 API 是可选的,虽然许多常见的 C 实现提供支持,但您使用的可能不支持。

Luckily, threading has been used extensively with C, before the 2011 standard update, and there are multiple non-standard APIs for threading that you can use as libraries.幸运的是,在 2011 年标准更新之前,线程已广泛用于 C,并且有多个非标准线程 API 可用作库。 For example, check out pthread as a common cross platform threading library for C.例如,查看pthread作为 C 的通用跨平台线程库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM