简体   繁体   English

C ++ 17并行硬件实现

[英]C++ 17 parallelism hardware implementation

As I could understand, C++ 17 will come with Parallelism . 据我所知,C ++ 17将带有Parallelism However, what I could not understand is it a specific hardware parallelism (CPU by default)? 但是,我无法理解的是它是一种特定的硬件并行性(默认为CPU)? Or it can be extended to any hardware with multiple computation units? 或者它可以扩展到具有多个计算单元的任何硬件?

In other words, will we see something like,for example, "nVidia C++ standard compiler" which is going to compile the parallel parts to be executed on GPUs? 换句话说,我们会看到类似于“nVidia C ++标准编译器”的东西,它将编译要在GPU上执行的并行部分吗?

Will it be some more standardized alternative to OpenCL for example? 例如,它是OpenCL的一些标准替代品吗?

Note : Absolutely, I am not asking "Will nVidia do that?". 注意 :当然,我不是在问“nVidia会这么做吗?”。 I am asking if C++ 17 standards allow that and if it is theoretically possible. 我在问C ++ 17标准是否允许,以及理论上是否可行。

The question provides a link to the paper proposing this change, and, with respect to the parallelism aspects, there haven't been substantial changes to what's proposed. 这个问题为提出这一变化的论文提供了一个链接,并且就并行性方面而言,对所提议的内容没有实质性的改变。 Yes, the compiler can do whatever makes sense for the target hardware to parallelize the execution of various algorithms, provided only that it gets the right answer (with some reservations) and that it doesn't impose unneeded overhead (again, with some reservations). 是的,编译器可以对目标硬件执行任何有意义的操作来并行化各种算法的执行,只要它得到正确的答案(带有一些保留)并且它不会产生不必要的开销(同样,有些保留) 。

There are a couple of important points to understand. 有几点需要了解。

First, C++17 parallelism is not a general parallel programming mechanism. 首先,C ++ 17并行性不是一般的并行编程机制。 It provides parallel versions of many of the STL algorithms, nothing more. 它提供了许多STL算法的并行版本,仅此而已。 So it's not a replacement for more powerful mechanisms like OpenCL, TBB, etc. 因此,它不能替代OpenCL,TBB等更强大的机制。

Second, there are inherent limitations when you try to parallelize algorithms, and that's why I added those two parenthesized qualifications. 其次,当您尝试并行化算法时存在固有的局限性,这就是为什么我添加了这两个带括号的资格。 For example, the parallel version of std::accumulate will produce the same result as the non-parallel version only if the function being applied to the input range is commutative and associative. 例如, 仅当应用于输入范围的函数是可交换和关联的时,并行版本的std::accumulate将产生与非并行版本相同的结果。 The most obvious problem area here is floating-point values, where math operations are not associative, so the result might differ. 这里最明显的问题是浮点值,其中数学运算不是关联的,因此结果可能不同。 Similarly, some algorithms actually impose more overhead when parallelized; 类似地,一些算法在并行化时实际上会产生更多的开销; you get a net speedup, but there is more total work done, so the speedup for those algorithms will not be linear in the number of processing units. 你得到了一个净加速,但总共完成了更多的工作,因此这些算法的加速比在处理单元的数量上不是线性的。 std::partial_sum is an example: each output value depends on the preceding value, so it's not simple to parallelize the algorithm. std::partial_sum是一个例子:每个输出值都取决于前面的值,因此并行化算法并不简单。 There are ways to do it, but you end up applying the combiner function more times than the non-parallel algorithm would. 有办法做到这一点,但最终应用组合函数的次数比非并行算法多。 In general, there are relaxations of the complexity requirements for algorithms in order to reflect this reality. 通常,为了反映这种现实,放松了算法的复杂性要求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM