如何使用“parallel for”而不是使用几个“for”？

Question

I am trying to write a faster code for sobel but I could not understand to use it for several for loop?我正在尝试为 sobel 编写更快的代码，但我无法理解将它用于多个 for 循环？

Should I use as many parallel for as the number of loops?我应该使用与循环数量一样多的并行吗？

Is this get effective?这有效果吗？

Can somebody explain it on codes: Here is the codes:有人可以在代码上解释一下：这是代码：

for (int y = 0; y < Image.Height; y++)
{
    for (int x = 0; x < Image.Width * 3; x += 3)
    {
        r_x = g_x = b_x = 0; //reset the gradients in x-direcion values
        r_y = g_y = b_y = 0; //reset the gradients in y-direction values
        location = x + y * ImageData.Stride; //to get the location of any pixel >> location = x + y * Stride
        for (int yy = -(int)Math.Floor(weights_y.GetLength(0) / 2.0d), yyy = 0; yy <= (int)Math.Floor(weights_y.GetLength(0) / 2.0d); yy++, yyy++)
        {
            if (y + yy >= 0 && y + yy < Image.Height) //to prevent crossing the bounds of the array
            {
                for (int xx = -(int)Math.Floor(weights_x.GetLength(1) / 2.0d) * 3, xxx = 0; xx <= (int)Math.Floor(weights_x.GetLength(1) / 2.0d) * 3; xx += 3, xxx++)
                {
                    if (x + xx >= 0 && x + xx <= Image.Width * 3 - 3) //to prevent crossing the bounds of the array
                    {
                        location2 = x + xx + (yy + y) * ImageData.Stride; //to get the location of any pixel >> location = x + y * Stride

                        sbyte weight_x = weights_x[yyy, xxx];
                        sbyte weight_y = weights_y[yyy, xxx];
                        //applying the same weight to all channels
                        b_x += buffer[location2] * weight_x;
                        g_x += buffer[location2 + 1] * weight_x; //G_X
                        r_x += buffer[location2 + 2] * weight_x;
                        b_y += buffer[location2] * weight_y;
                        g_y += buffer[location2 + 1] * weight_y;//G_Y
                        r_y += buffer[location2 + 2] * weight_y;
                    }
                }
            }
        }
        //getting the magnitude for each channel
        b = (int)Math.Sqrt(Math.Pow(b_x, 2) + Math.Pow(b_y, 2));
        g = (int)Math.Sqrt(Math.Pow(g_x, 2) + Math.Pow(g_y, 2));//G
        r = (int)Math.Sqrt(Math.Pow(r_x, 2) + Math.Pow(r_y, 2));

        if (b > 255) b = 255;
        if (g > 255) g = 255;
        if (r > 255) r = 255;

        //getting grayscale value
        grayscale = (b + g + r) / 3;

        //thresholding to clean up the background
        //if (grayscale < 80) grayscale = 0;
        buffer2[location] = (byte)grayscale;
        buffer2[location + 1] = (byte)grayscale;
        buffer2[location + 2] = (byte)grayscale;
        //thresholding to clean up the background
        //if (b < 100) b = 0;
        //if (g < 100) g = 0;
        //if (r < 100) r = 0;

        //buffer2[location] = (byte)b;
        //buffer2[location + 1] = (byte)g;
        //buffer2[location + 2] = (byte)r;
    }
}

Answer 1

The most important questions are: is the work trivially parallelizable, and does the object model you're using support concurrency .最重要的问题是：工作是否可以简单地并行化，您使用的对象模型是否支持并发。 Things that are purely math related and where the outcomes aren't cumulative tend to be parallelizable, but I can't comment on the object model's thread-safety.纯粹与数学相关且结果不累积的事物往往是可并行化的，但我无法评论对象模型的线程安全性。 It isn't guaranteed (and the default is usually "no").不能保证（默认值通常为“否”）。

As for where:至于在哪里：

There is very little point having nested parallelism;嵌套并行性几乎没有什么意义； parallelism has overheads, and magnifying those overheads is counter-productive.并行性有开销，放大这些开销会适得其反。 The most effective way to treat parallelism is to think "chunky" - ie a relatively small number of non-trivial operations (but hopefully at least as many as available CPU cores), rather than huge numbers of trivial operations.处理并行性的最有效方法是考虑“笨拙”——即相对少量的非平凡操作（但希望至少与可用的 CPU 内核一样多），而不是大量的平凡操作。 As such, the most effective place to put parallelism is usually: the outermost loop.因此，放置并行性的最有效位置通常是：最外层循环。

However!然而！ Note that you need to avoid shared state: r_x , g_x and b_x and the same for the _y parts (and any other shared locals) would need to be declared inside the parallel part, to ensure that they are independent.请注意，您需要避免共享状态： r_x 、 g_x和b_x以及_y部分（以及任何其他共享b_x的相同状态需要在并行部分内声明，以确保它们是独立的。 Other things to look at: grayscale , location , location2 , r , g , b , yyy , xxx .其他需要查看的内容： grayscale 、 location 、 location2 、 r 、 g 、 b 、 yyy 、 xxx 。 It would be good to see where these things are currently declared, but my suspicion is that they'd all need to be moved so that they're declared inside the parallel portion.看看这些东西当前在哪里声明会很好，但我怀疑它们都需要移动，以便它们在并行部分内声明。 Check all locals that are declared, and all fields that are accessed.检查所有声明的局部变量，以及访问的所有字段。

It looks like buffer and buffer2 are simply input/output arrays, in which case: they should work OK in this case.看起来buffer和buffer2只是输入/输出数组，在这种情况下：它们在这种情况下应该可以正常工作。

如何使用“parallel for”而不是使用几个“for”？

问题描述

1 个解决方案

解决方案1
1 2021-10-20 09:00:30

如何使用“parallel for”而不是使用几个“for”？

问题描述

1 个解决方案

解决方案1 1 2021-10-20 09:00:30

解决方案1
1 2021-10-20 09:00:30