简体   繁体   English

自动矢量化关注区域(裁剪)

[英]Auto vectorization Region of interest (crop)

I have a library which has some image processing algorithms including a Region of Interest (crop) algorithm. 我有一个包含一些图像处理算法的库,其中包括感兴趣区域(crop)算法。 When compiling with GCC, the auto vectorizer speeds up a lot of the code but worsens the performance of the Crop algorithm. 使用GCC进行编译时,自动矢量化器可加快许多代码的速度,但会恶化Crop算法的性能。 Is there a way of flagging a certain loop to be ignored by the vectorizer or is there a better way of structuring the code for better performance? 有没有一种方法可以标记某个循环,以使矢量化程序将其忽略,或者有没有更好的方法来构造代码以提高性能?

for (RowIndex=0;RowIndex<Destination.GetRows();++RowIndex)
{
    rowOffsetS = ((OriginY + RowIndex) * SizeX) + OriginX;
    rowOffsetD = (RowIndex * Destination.GetColumns());
    for (ColumnIndex=0;ColumnIndex<Destination.GetColumns();++ColumnIndex)
    {
        BufferSPtr=BufferS + rowOffsetS + ColumnIndex;
        BufferDPtr=BufferD + rowOffsetD + ColumnIndex;
        *BufferDPtr=*BufferSPtr;
    }
}

Where SizeX is the width of the source OriginX is the left of the region of interest OriginY is the top of the region of interest SizeX是源的宽度OriginX是关注区域的左侧OriginY是感兴趣的区域的顶部

I haven't found anything about changing the optimization flags for a loop, however according to the documentation you can use the attribute optimize (look here and here ) on a function to override the optimization settings for that function somewhat like this: 我还没有发现任何关于更改循环的优化标志的信息,但是根据文档,您可以在函数上使用属性optimize在此处在此处查看 )来覆盖该函数的优化设置,如下所示:

void foo() __attribute__((optimize("O2", "inline-functions")))

If you want to change it for several functions, you can use #pragma GCC optimize to set it for all following functions ( look here ). 如果要为多个功能更改它,可以使用#pragma GCC optimize为以下所有功能设置它(请参见此处 )。

So you should be able to compile the function containing crop with a different set of optimization flags, omitting the auto-vectorization. 因此,您应该能够使用一组不同的优化标志来编译包含作物的函数,而无需进行自动向量化。 That has the disadvantage of hardcoding the compilation flags for that function, but is the best I found. 这样做的缺点是硬编码该函数的编译标志,但这是我发现的最好的方法。

With regards to restructuring for better performance the two points I already mentioned in the comments come to mind (assuming the ranges can't overlap): 关于为了获得更好的性能而进行的重组,我已经在评论中提到了两点(假设范围不能重叠):

  • declaring the pointers as __restrict to tell the compiler that they don't alias (the area pointed to by one pointer won't be accessed by any other means inside the function). 将指针声明为__restrict以告诉编译器它们没有别名(一个指针指向的区域不会通过函数内的任何其他方式访问)。 The possibility of pointer aliasing is a major stumbling block for the optimizer, since it can't easily reorder the accesses if it doesn't know if writing to BufferD will change the contents of BufferS . 指针别名的可能性是优化的一大绊脚石,因为它不能轻易地重新排序的访问,如果它不知道,如果写BufferD将改变内容BufferS

  • Replacing the inner loop with a call to copy: 用复制调用替换内部循环:

     std::copy(BufferS + rowOffsetS, BufferS + rowOffsetS + Destination.GetColumns(), BufferD + rowOffsetD); 

    The copy function is likely to be pretty well optimized (probably forwarding the arguments to memmove ), so that might make your code faster, while also making your code shorter (always a plus). copy函数可能已经进行了很好的优化(可能会将参数转发给memmove ),这样可以使您的代码更快,同时也可以使代码更短(总是加号)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM