简体   繁体   English

平行班不提供任何提速

[英]Parallel class does not provide any speed up

I'm trying to create a method which will filter all pixels below given grayscale threshold out (as in, all below will be black, all above will be white). 我正在尝试创建一种方法,该方法将过滤掉低于给定灰度阈值的所有像素(例如,下面的全部为黑色,上面的全部为白色)。 The method works, but is not as fast as I feel it could be. 该方法有效,但是没有我想象的那么快。

I decided to use the Parallel class but no matter what I set the MaxDegreeOfParallelism I don't get any speed benefits. 我决定使用Parallel类,但是无论我设置MaxDegreeOfParallelism我都不会获得任何速度优势。 I perform some other operations on the bitmap too, and the total time of the operations, no matter what MaxDegreeOfParallelism is is always around 170 ms. 我也对位图执行其他一些操作,无论MaxDegreeOfParallelism是什么,操作的总时间始终约为170 ms。 When debugging, the time needed to perform this filtering itself takes around 160 ms, so I think there would be a noticeable overall difference. 调试时,执行此过滤本身所需的时间约为160毫秒,因此我认为总体差异会很大。

I'm using an i7 processor, 4 physical cores, 8 logical cores. 我正在使用i7处理器,4个物理核心,8个逻辑核心。

The code: 编码:

Color black = System.Drawing.Color.FromArgb(0, 0, 0);
Color white = System.Drawing.Color.FromArgb(255, 255, 255);

int lowerBound = (int)((float)lowerBoundPercent * 255.0 / 100.0);
int upperBound = (int)((float)upperBoundPercent * 255.0 / 100.0);

int[][] border = new int[8][];
for (int i=0;i<8;i++)
{
    border[i] = new int[] { i*height/8, (i+1)*height/8-1};
}

Parallel.For(0, 8, new ParallelOptions { MaxDegreeOfParallelism = 8 }, i =>
    {
        for (int k = 0; k < width; k++)
        {
            for (int j = border[i][0]; j <= border[i][1]; j++)
            {
                Color pixelColor;
                int grayscaleValue;
                pixelColor = color[k][j];
                grayscaleValue = (pixelColor.R + pixelColor.G + pixelColor.B) / 3;
                if (grayscaleValue >= lowerBound && grayscaleValue <= upperBound)
                    color[k][j] = white;
                else
                    color[k][j] = black;
            }
        }
    });

color[][] is a jagged array of System.Drawing.Color . color[][]System.Drawing.Color的锯齿状数组。

The question: is this normal? 问题:这正常吗? If not, what can I do to change it? 如果没有,我该怎么做才能改变它?

EDIT: 编辑:

Pixel extraction: 像素提取:

Color[][] color;
color = new Color[bitmap.Width][];
for (int i = 0; i < bitmap.Width; i++)
{
    color[i] = new Color[bitmap.Height];
    for (int j = 0; j < bitmap.Height; j++)
    {
        color[i][j] = bitmap.GetOriginalPixel(i, j);
    }
}

Bitmap is an instance of my own class Bitmap: 位图是我自己的类位图的一个实例:

public class Bitmap
{
    System.Drawing.Bitmap processed;
    //...
    public Color GetOriginalPixel(int x, int y) { return processed.GetPixel(x, y); }
    //...
}

To answer your main question about why your parallel method is not any faster, Parralel.For only starts out with one thread then adds more theads as it detects that more threads may be benifitial in speeding up the work to do, note that the parallel option is Max DegreeOfParallelism not just DegreeOfParallelism . 要回答关于为什么并行方法没有更快的主要问题, Parralel.For仅从一个线程开始,然后添加更多主题,因为它检测到更多线程可能在加快工作速度方面是有益的,请注意并行选项是Max DegreeOfParallelism不仅是DegreeOfParallelism Quite simply there is just not enough iterations of the loop for it to spin up enough threads to be effective, you need to give each iteration less work to do. 很简单,循环的迭代次数不足以使足够多的线程生效,因此您需要减少每次迭代的工作量。

Try giving the parallel operation more work to do by looping of the width instead of by 8 chunks of the height. 尝试通过宽度的循环而不是高度的8个块来给并行操作做更多的工作。

Color black = System.Drawing.Color.FromArgb(0, 0, 0);
Color white = System.Drawing.Color.FromArgb(255, 255, 255);

int lowerBound = (int)((float)lowerBoundPercent * 255.0 / 100.0) * 3;
int upperBound = (int)((float)upperBoundPercent * 255.0 / 100.0) * 3;

Parallel.For(0, width, k =>
    {
        for (int j = 0; j < height; j++)
        {
                Color pixelColor;
                int grayscaleValue;
                pixelColor = color[k][j];
                grayscaleValue = (pixelColor.R + pixelColor.G + pixelColor.B);
                if (grayscaleValue >= lowerBound && grayscaleValue <= upperBound)
                    color[k][j] = white;
                else
                    color[k][j] = black;
        }
    });

I would not do both width and height in parallel, you then will likely run in to the opposite problem of not giving each iteration enough work to do. 我不会同时进行宽度和高度的选择,然后您可能会遇到一个相反的问题,即没有给每个迭代足够的工作。

I highly recommend you go download and read Patterns for Parallel Programming , it goes in to this exact example when discussing how much work you should give a Parallel.For . 我强烈建议您下载并阅读“并行编程模式” ,它在讨论您应该为Parallel.For多少工作时使用了这个确切的示例。 Look at the " Very Small Loop Bodies " and " Too fine-grained, Too corse-grained " Anti-Patterns starting at the bottom of page 26 of the C# version to see the exact problems you are running in to. 从C#版本的第26页底部开始,查看“ 非常小的循环体 ”和“ 太细粒度,太粗粒度 ”的“反模式”,以查看遇到的确切问题。

Also I would look in to using LockBits for reading the pixel data in and out instead of GetPixel and SetPixel like we discussed in the comments. 我还将研究使用LockBits来读入和读出像素数据,而不是像评论中讨论的那样使用GetPixel和SetPixel。

Using LockBits I managed to cut the time from ~165 ms to ~55 ms per frame. 通过使用LockBits我设法将时间从每帧LockBits ms减少到了LockBits ms。 Then I proceeded to do some more research and combined LockBits with pointer operations in an unsafe context and the Parallel.For loop. 然后,我继续进行一些研究,并将LockBits与不安全上下文中的指针操作和Parallel.For循环结合在一起。 The resulting code: 结果代码:

Bitmap class: 位图类:

public class Bitmap
{
    System.Drawing.Bitmap processed;
    public System.Drawing.Bitmap Processed { get { return processed; } set { processed = value; } }
    // ...
}    

The method: 方法:

int lowerBound = 3*(int)((float)lowerBoundPercent * 255.0 / 100.0);
int upperBound = 3*(int)((float)upperBoundPercent * 255.0 / 100.0);

System.Drawing.Bitmap bp = bitmap.Processed;

int width = bitmap.Width;
int height = bitmap.Height;

Rectangle rect = new Rectangle(0, 0, width, height);
System.Drawing.Imaging.BitmapData bpData = bp.LockBits(rect, System.Drawing.Imaging.ImageLockMode.ReadWrite, bp.PixelFormat);

unsafe
{
    byte* s0 = (byte*)bpData.Scan0.ToPointer();
    int stride = bpData.Stride;

    Parallel.For(0, height, y1 =>
    {
        int posY = y1 * stride;
        byte* cpp = s0 + posY;

        for (int x =0; x<width; x++)
        {
            int total = cpp[0] + cpp[1] + cpp[2];
            if (total >= lowerBound && total <= upperBound)
            {
                cpp[0] = 255;
                cpp[1] = 255;
                cpp[2] = 255;
                cpp[3] = 255;
            }
            else
            {
                cpp[0] = 0;
                cpp[1] = 0;
                cpp[2] = 0;
                cpp[3] = 255;
            }

            cpp += 4;
        }
    });
}

bp.UnlockBits(bpData);

With this kind of work division in the Parallel.For loop the code executes in 1-5 ms, which means approximately a 70x speed up! 通过Parallel.For循环中的这种工作划分,代码将在1-5毫秒内执行,这意味着速度提高了约70倍!

I tried making the chunks for the loop 4x and 8x bigger and the time range is still 1-5ms, so I won't go into that. 我尝试将循环的块增大4倍和8倍,并且时间范围仍然是1-5毫秒,因此我不再赘述。 The loop is fast enough anyways. 循环足够快。

Thank you very much for your answer, Scott, and thanks everyone for input in the comments. 非常感谢您的回答,斯科特,也感谢大家在评论中的投入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM