简体   繁体   English

c ++如何编写代码编译器可以轻松优化SIMD?

[英]c++ how to write code the compiler can easily optimize for SIMD?

i'm working in Visual Studio 2008 and in the project settings I see the option for "activate Extended Instruction set" which I can set to None, SSE or SSE2 我在Visual Studio 2008中工作,在项目设置中,我看到“激活扩展指令集”的选项,我可以将其设置为None,SSE或SSE2

So the compiler will try to batch instructions together in order to make use of SIMD instructions? 那么编译器会尝试将指令一起批处理以便使用SIMD指令吗?

Are there any rules one can follow in how to optimize code such that the compiler can make effiecient assembler using these extensions? 是否有任何规则可以遵循如何优化代码,以便编译器可以使用这些扩展使高效的汇编程序?

For example currently i'm working on a raytracer. 例如,目前我正在研究光线跟踪器。 A shader takes some input and calculates from the input an output color, like this: 着色器接受一些输入并从输入计算输出颜色,如下所示:

PixelData data = RayTracer::gatherPixelData(pixel.x, pixel.y);
Color col = shadePixel(data);

would it for example be beneficial to write the shadercode such that it would shade 4 different pixels within one instruction call? 例如,编写着色器代码是否有利于它会在一个指令调用中遮蔽4个不同的像素? something like this: 这样的事情:

PixelData data1 = RayTracer::gatherPixelData(pixel1.x, pixel1.y);
...
shadePixels(data1, data2, data3, data4, &col1out, &col2out, &col3out, &col4out);

to process multiple dataunits at once. 一次处理多个数据单元。 would This be beneficial for making the compiler use SSE instructions? 这有利于使编译器使用SSE指令吗?

thanks! 谢谢!

i'm working in Visual Studio 2008 and in the project settings I see the option for "activate Extended Instruction set" which I can set to None, SSE or SSE2 我在Visual Studio 2008中工作,在项目设置中,我看到“激活扩展指令集”的选项,我可以将其设置为None,SSE或SSE2

So the compiler will try to batch instructions together in order to make use of SIMD instructions? 那么编译器会尝试将指令一起批处理以便使用SIMD指令吗?

No, the compiler will not use vector instructions on its own. 不,编译器不会自己使用向量指令。 It will use scalar SSE instructions instead of x87 ones. 它将使用标量SSE指令而不是x87指令。

What you describe is called "automatic vectorization". 您描述的内容称为“自动矢量化”。 Microsoft compilers do not do this, Intel compilers do. 英特尔编译器会这样做,微软编译器不这样做。

On Microsoft compiler you can use intrinsics to perform manual SSE optimizations. 在Microsoft编译器上,您可以使用内在函数来执行手动SSE优化。

Three observations. 三个观察。

  1. The best speedups are not coming from optimizations but from good algorithms . 最好的加速不是来自优化,而是来自优秀的算法 So make sure you get that part right first. 因此,请确保您首先获得该部分。 Often this means just using the right libraries for your specific domain. 通常这意味着只为您的特定域使用正确的库。

  2. Once you get your algorithms right it is time to Measure . 一旦你的算法正确,就可以进行测量了 Often there is an 80/20 rule at work. 工作中通常有80/20规则。 20% of your code will take 80% of the execution time. 20%的代码将占用80%的执行时间。 But in order to locate that part you need a good profiler. 但是为了找到那个部分你需要一个好的剖析器。 Intel VTune can give you sampling profile from every function and nice reports that pinpoint the performance killers. 英特尔VTune可以为您提供每个功能的采样配置文件以及可以精确定位性能杀手的精彩报告。 Another free alternative is AMD CodeAnalyst if you have an AMD CPU. 如果您有AMD CPU,另一个免费替代方案是AMD CodeAnalyst

  3. The compiler autovectorization capability is not a silver bullet. 编译器自动向量化功能不是灵丹妙药。 Although it will try really hard (especially Intel C++ ) you will often need to help it by rewriting the algorithms in vector form. 虽然它会非常努力(尤其是英特尔C ++ ),但您通常需要通过以矢量形式重写算法来帮助它。 You can often get much better results by handcrafting small portions of the bottleneck code to use SIMD instructions. 通过手工制作瓶颈代码的一小部分来使用SIMD指令,您通常可以获得更好的结果。 You can do that in C code (see VJo's link above) using intrinsics or use inline assembly. 您可以使用内在函数或使用内联汇编在C代码(请参阅上面的VJo链接)中执行此操作。

Of course parts 2 and 3 form an iterative process. 当然,部分2和3形成迭代过程。 If you are really serious about this then there are some good books on the subject by Intel folks such as The Software Optimization Cookbook and the processor reference manuals. 如果您对此非常认真,那么英特尔人员可以提供一些关于此主题的好书,例如The Software Optimization Cookbook和处理器参考手册。

The compiler is not all mighty, and it has some limitations. 编译器并不都是强大的,它有一些局限性。 If it can (and if right flags are passed to it), it will use SSE instructions. 如果可以(如果传递了正确的标志),它将使用SSE指令。 The only way to see what it did is to examine the assembly code generated by the compiler. 查看它所做的唯一方法是检查编译器生成的汇编代码。

Another option is to use C SSE/SSE2 instructions. 另一种选择是使用C SSE / SSE2指令。 For windows you can find them here: 对于Windows,你可以在这里找到它们:

http://msdn.microsoft.com/en-us/library/y0dh78ez%28VS.80%29.aspx http://msdn.microsoft.com/en-us/library/y0dh78ez%28VS.80%29.aspx

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM