简体   繁体   English

优化最近邻居调整大小算法以提高速度

[英]Optimize a nearest neighbor resizing algorithm for speed

I'm using the next algorithm to perform nearest neighbor resizing. 我正在使用下一个算法来执行最近邻居调整大小。 Is there anyway to optimize it's speed? 无论如何,有没有优化它的速度? Input and Output buffers are in ARGB format, though images are known to be always opaque. 输入和输出缓冲区为ARGB格式,尽管图像始终是不透明的。 Thank you. 谢谢。

void resizeNearestNeighbor(const uint8_t* input, uint8_t* output, int sourceWidth, int sourceHeight, int targetWidth, int targetHeight)
{
    const int x_ratio = (int)((sourceWidth << 16) / targetWidth);
    const int y_ratio = (int)((sourceHeight << 16) / targetHeight) ;
    const int colors = 4;

    for (int y = 0; y < targetHeight; y++)
    {
        int y2_xsource = ((y * y_ratio) >> 16) * sourceWidth;
        int i_xdest = y * targetWidth;

        for (int x = 0; x < targetWidth; x++)
        {
            int x2 = ((x * x_ratio) >> 16) ;
            int y2_x2_colors = (y2_xsource + x2) * colors;
            int i_x_colors = (i_xdest + x) * colors;

            output[i_x_colors]     = input[y2_x2_colors];
            output[i_x_colors + 1] = input[y2_x2_colors + 1];
            output[i_x_colors + 2] = input[y2_x2_colors + 2];
            output[i_x_colors + 3] = input[y2_x2_colors + 3];
        }
    }
}

restrict keyword will help a lot, assuming no aliasing. restrict关键字将有很大的帮助,假设没有走样。

Another improvement is to declare another pointerToOutput and pointerToInput as uint_32_t , so that the four 8-bit copy-assignments can be combined into a 32-bit one, assuming pointers are 32bit aligned. 另一个改进是将另一个pointerToOutputpointerToInputuint_32_t ,以便可以将四个8位副本分配合并为一个32位副本(假设指针是32位对齐的)。

There's little that you can do to speed this up, as you already arranged the loops in the right order and cleverly used fixed-point arithmetic. 由于已经按正确的顺序排列了循环并巧妙地使用了定点算法,因此您几乎无能为力。 As others suggested, try to move the 32 bits in a single go (hoping that the compiler didn't see that yet). 正如其他人建议的那样,尝试一次移动32位(希望编译器尚未看到该位)。

In case of significant enlargement, there is a possibility: you can determine how many times every source pixel needs to be replicated (you'll need to work on the properties of the relation Xd=Wd.Xs/Ws in integers), and perform a single pixel read for k writes. 如果出现较大的放大,则有可能:您可以确定每个源像素需要复制多少次(您需要对关系Xd = Wd.Xs / Ws的属性进行整数处理),然后执行单像素读取k次写入。 This also works on the y's, and you can memcpy the identical rows instead of recomputing them. 这也适用于y,您可以存储相同的行,而不必重新计算它们。 You can precompute and tabulate the mappings of the X's and Y's using run-length coding. 您可以使用游程长度编码对X和Y的映射关系进行预先计算和制表。

But there is a barrier that you will not pass: you need to fill the destination image. 但是存在一个障碍,您将无法通过:您需要填充目标图像。

If you are desperately looking for speedup, there could remain the option of using vector operations (SEE or AVX) to handle several pixels at a time. 如果您急切地希望提高速度,则可以选择使用矢量运算(SEE或AVX)来一次处理多个像素。 Shuffle instructions are available that might enable to control the replication (or decimation) of the pixels. 可以使用随机播放指令来控制像素的复制(或抽取)。 But due to the complicated replication pattern combined with the fixed structure of the vector registers, you will probably need to integrate a complex decision table. 但是由于复杂的复制模式以及向量寄存器的固定结构,您可能需要集成一个复杂的决策表。

The algorithm is fine, but you can utilize massive parallelization by submitting your image to the GPU. 该算法很好,但是您可以通过将图像提交给GPU来利用大规模并行化。 If you use opengl, simply creating a context of the new size and providing a properly sized quad can give you inherent nearest neighbor calculations. 如果您使用opengl,只需创建新大小的上下文并提供适当大小的四边形即可为您提供固有的最近邻居计算。 Also opengl could give you access to other resizing sampling techniques by simply changing the properties of the texture you read from (which would amount to a single gl command which could be an easy paramter to your resize function). 同样,opengl可以通过简单地更改从中读取的纹理的属性来访问其他调整大小的采样技术(这相当于一个gl命令,这可能是调整大小功能的简单参数)。

Also later in development, you could simply swap out a shader for other blending techniques which also keeps you utilizing your wonderful GPU processor of image processing glory. 同样在以后的开发中,您可以简单地将着色器换成其他混合技术,这也使您可以利用出色的GPU处理器来处理图像处理。

Also, since you aren't using any fancy geometry it can become almost trivial to write the program. 另外,由于您没有使用任何精美的几何图形,因此编写程序几乎变得不那么容易。 It would be a little more involved than your algorithm, but it could perform magnitudes faster depending on image size. 它比您的算法要复杂得多,但是根据图像大小,它可以更快地执行幅度。

I hope I didn't break anything. 我希望我没有破坏任何东西。 This combines some of the suggestions posted thus far and is about 30% faster. 这结合了到目前为止发布的一些建议,速度提高了约30%。 I'm amazed that is all we got. 我很惊讶这就是我们所拥有的。 I did not actually check the destination image to see if it was right. 我实际上没有检查目标图像是否正确。

Changes: - remove multiplies from inner loop (10% improvement) - uint32_t instead of uint8_t (10% improvement) - __restrict keyword (1% improvement) 更改:-从内部循环中删除乘法(提高10%)-用uint32_t代替uint8_t(提高10%)-__restrict关键字(提高1%)

This was on an i7 x64 machine running Windows, compiled with MSVC 2013. You will have to change the __restrict keyword for other compilers. 这是在运行Windows的i7 x64计算机上,该计算机使用MSVC 2013编译。您必须为其他编译器更改__restrict关键字。

void resizeNearestNeighbor2_32(const uint8_t* __restrict input, uint8_t* __restrict output, int sourceWidth, int sourceHeight, int targetWidth, int targetHeight)
{
    const uint32_t* input32 = (const uint32_t*)input;
    uint32_t* output32 = (uint32_t*)output;

    const int x_ratio = (int)((sourceWidth << 16) / targetWidth);
    const int y_ratio = (int)((sourceHeight << 16) / targetHeight);

    int x_ratio_with_color = x_ratio;

    for (int y = 0; y < targetHeight; y++)
    {
        int y2_xsource = ((y * y_ratio) >> 16) * sourceWidth;
        int i_xdest = y * targetWidth;

        int source_x_offset = 0;
        int startingOffset = y2_xsource;
        const uint32_t * inputLine = input32 + startingOffset;
        for (int x = 0; x < targetWidth; x++)
        {
            i_xdest += 1;
            source_x_offset += x_ratio_with_color;
            int sourceOffset = source_x_offset >> 16;

            output[i_xdest] = inputLine[sourceOffset];
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM