How can I use "parallel for" instead of using a few "for"?

Question

I am trying to write a faster code for sobel but I could not understand to use it for several for loop?

Should I use as many parallel for as the number of loops?

Is this get effective?

Can somebody explain it on codes: Here is the codes:

for (int y = 0; y < Image.Height; y++)
{
    for (int x = 0; x < Image.Width * 3; x += 3)
    {
        r_x = g_x = b_x = 0; //reset the gradients in x-direcion values
        r_y = g_y = b_y = 0; //reset the gradients in y-direction values
        location = x + y * ImageData.Stride; //to get the location of any pixel >> location = x + y * Stride
        for (int yy = -(int)Math.Floor(weights_y.GetLength(0) / 2.0d), yyy = 0; yy <= (int)Math.Floor(weights_y.GetLength(0) / 2.0d); yy++, yyy++)
        {
            if (y + yy >= 0 && y + yy < Image.Height) //to prevent crossing the bounds of the array
            {
                for (int xx = -(int)Math.Floor(weights_x.GetLength(1) / 2.0d) * 3, xxx = 0; xx <= (int)Math.Floor(weights_x.GetLength(1) / 2.0d) * 3; xx += 3, xxx++)
                {
                    if (x + xx >= 0 && x + xx <= Image.Width * 3 - 3) //to prevent crossing the bounds of the array
                    {
                        location2 = x + xx + (yy + y) * ImageData.Stride; //to get the location of any pixel >> location = x + y * Stride

                        sbyte weight_x = weights_x[yyy, xxx];
                        sbyte weight_y = weights_y[yyy, xxx];
                        //applying the same weight to all channels
                        b_x += buffer[location2] * weight_x;
                        g_x += buffer[location2 + 1] * weight_x; //G_X
                        r_x += buffer[location2 + 2] * weight_x;
                        b_y += buffer[location2] * weight_y;
                        g_y += buffer[location2 + 1] * weight_y;//G_Y
                        r_y += buffer[location2 + 2] * weight_y;
                    }
                }
            }
        }
        //getting the magnitude for each channel
        b = (int)Math.Sqrt(Math.Pow(b_x, 2) + Math.Pow(b_y, 2));
        g = (int)Math.Sqrt(Math.Pow(g_x, 2) + Math.Pow(g_y, 2));//G
        r = (int)Math.Sqrt(Math.Pow(r_x, 2) + Math.Pow(r_y, 2));

        if (b > 255) b = 255;
        if (g > 255) g = 255;
        if (r > 255) r = 255;

        //getting grayscale value
        grayscale = (b + g + r) / 3;

        //thresholding to clean up the background
        //if (grayscale < 80) grayscale = 0;
        buffer2[location] = (byte)grayscale;
        buffer2[location + 1] = (byte)grayscale;
        buffer2[location + 2] = (byte)grayscale;
        //thresholding to clean up the background
        //if (b < 100) b = 0;
        //if (g < 100) g = 0;
        //if (r < 100) r = 0;

        //buffer2[location] = (byte)b;
        //buffer2[location + 1] = (byte)g;
        //buffer2[location + 2] = (byte)r;
    }
}

Answer 1

The most important questions are: is the work trivially parallelizable, and does the object model you're using support concurrency . Things that are purely math related and where the outcomes aren't cumulative tend to be parallelizable, but I can't comment on the object model's thread-safety. It isn't guaranteed (and the default is usually "no").

As for where:

There is very little point having nested parallelism; parallelism has overheads, and magnifying those overheads is counter-productive. The most effective way to treat parallelism is to think "chunky" - ie a relatively small number of non-trivial operations (but hopefully at least as many as available CPU cores), rather than huge numbers of trivial operations. As such, the most effective place to put parallelism is usually: the outermost loop.

However! Note that you need to avoid shared state: r_x , g_x and b_x and the same for the _y parts (and any other shared locals) would need to be declared inside the parallel part, to ensure that they are independent. Other things to look at: grayscale , location , location2 , r , g , b , yyy , xxx . It would be good to see where these things are currently declared, but my suspicion is that they'd all need to be moved so that they're declared inside the parallel portion. Check all locals that are declared, and all fields that are accessed.

It looks like buffer and buffer2 are simply input/output arrays, in which case: they should work OK in this case.

How can I use "parallel for" instead of using a few "for"?

Question

1 answers

solution1
1 2021-10-20 09:00:30

How can I use "parallel for" instead of using a few "for"?

Question

1 answers

solution1 1 2021-10-20 09:00:30

solution1
1 2021-10-20 09:00:30