如何在 parallel.for 中强制执行一系列有序执行？

Question

I have a simple parallel loop doing stuff, and afterwards I save the results to a file.我有一个简单的并行循环做事，然后我将结果保存到一个文件中。

object[] items; // array with all items
object[] resultArray = new object[numItems];
Parallel.For(0, numItems, (i) => 
{ 
    object res = doStuff(items[i], i);
    resultArray[i] = res;
});

foreach (object res in resultArray)
{
    sequentiallySaveResult(res);
}

For the saving, I need to write the results in the correct sequential order.为了节省，我需要按正确的顺序编写结果。 By putting the results in the resultArray , the order of the results is correct again.通过将结果放入resultArray中，结果的顺序再次正确。

However, as the results are pretty big and take a lot of memory.但是，由于结果相当大并且占用了很多 memory。 I would like to process the items in-order, as in eg four threads start and work on items 1-4, next free thread takes item 5 and so on.我想按顺序处理项目，例如四个线程启动并处理项目 1-4，下一个空闲线程处理项目 5，依此类推。

With that, I could start another Thread, monitoring the item that needs to be written next in the array (or each thread could emit an event when an item is finished), so I can already start writing the first results while the later items are still being processed and then free the memory.有了这个，我可以启动另一个线程，监视需要在数组中写入下一个项目（或者每个线程可以在一个项目完成时发出一个事件），所以我已经可以开始写第一个结果，而后面的项目是仍在处理中，然后释放 memory。

Is it possible for Parallel.For to process the items in the given order? Parallel.For 是否可以按给定顺序处理项目？ I of course I could use a concurentQueue , put all the indices in the right order in there and start threads manually.我当然可以使用concurentQueue ，将所有索引按正确的顺序放在那里并手动启动线程。

But if possible, I would like to keep all the automations on how many threads to use etc. that are in the ´Parallel.For´ implementation.但如果可能的话，我想保留在“Parallel.For”实现中使用多少线程等的所有自动化。

Disclaimer: I cannot switch to an ForEach , I need the i .免责声明：我无法切换到ForEach ，我需要i 。

EDIT #1:编辑＃1：
Currently, the execution order is totally random, one example:目前，执行顺序是完全随机的，一个例子：

Processing item 1/255
Processing item 63/255
Processing item 32/255
Processing item 125/255
Processing item 94/255
Processing item 156/255
Processing item 187/255
Processing item 249/255
...

EDIT #2:编辑#2：
More details to the job that is done:有关已完成工作的更多详细信息：

I process a grayscale image and need to extract information for each "layer" (items in the example above), so I go from 0 to 255 (for 8bit) and perform a task on the image.我处理一个灰度图像，需要为每个“层”（上例中的项目）提取信息，所以我 go 从 0 到 255（对于 8 位）并在图像上执行任务。

I have a class to access the pixel values concurrently:我有一个 class 可以同时访问像素值：

 unsafe class UnsafeBitmap : IDisposable
    {

        private BitmapData bitmapData;
        private Bitmap gray;
        private int bytesPerPixel;
        private int heightInPixels;
        private int widthInBytes;
        private byte* ptrFirstPixel;

        public void PrepareGrayscaleBitmap(Bitmap bitmap, bool invert)
        {
            gray = MakeGrayscale(bitmap, invert);

            bitmapData = gray.LockBits(new Rectangle(0, 0, gray.Width, gray.Height), ImageLockMode.ReadOnly, gray.PixelFormat);
            bytesPerPixel = System.Drawing.Bitmap.GetPixelFormatSize(gray.PixelFormat) / 8;
            heightInPixels = bitmapData.Height;
            widthInBytes = bitmapData.Width * bytesPerPixel;
            ptrFirstPixel = (byte*)bitmapData.Scan0;
        }

        public byte GetPixelValue(int x, int y)
        {
            return (ptrFirstPixel + ((heightInPixels - y - 1) * bitmapData.Stride))[x * bytesPerPixel];
        }

        public void Dispose()
        {
            gray.UnlockBits(bitmapData);
        }
    }

And the loop is循环是

UnsafeBitmap ubmp; // initialized, has the correct bitmap
int numLayers = 255;
int bitmapWidthPx = 10000;
int bitmapHeightPx = 10000;
object[] resultArray = new object[numLayer];
Parallel.For(0, numLayers, (i) => 
{ 
        for (int x = 0; x < bitmapWidthPx ; x++)
    {
        inLine = false;
        for (int y = 0; y < bitmapHeightPx ; y++)
        {
            byte pixel_value = ubmp.GetPixelValue(x, y);
            
            if (i <= pixel_value && !inLine)
            {
                result.AddStart(x,y);
                inLine = true;
            }
            else if ((i > pixel_value || y == Height - 1) && inLine)
            {
                result.AddEnd(x, y-1);
                inLine = false;
            }
        }
    }
    result_array[i] = result;
});

foreach (object res in resultArray)
{
    sequentiallySaveResult(res);
}

And I would like to also start a thread for the saving, checking if the item that needs to be written next is available, write it, discard from memory.我还想启动一个线程进行保存，检查下一个需要写入的项目是否可用，写入它，从 memory 丢弃。 And for this, it would be good if the processing starts in order, so that the result arrive roughly in order.为此，最好按顺序开始处理，以便结果大致按顺序到达。 If the result for layer 5 arrives second to last, I have to wait writing layer 5 (and all following) until the end.如果第 5 层的结果倒数第二个到达，我必须等待写入第 5 层（以及所有后续）直到最后。

If 4 threads start, start processing layers 1-4, and when a thread is done, starts processing layer 5, next one layer 6 and so on, the results will come more or less in the same order and I can start writing result to the file and discarding them from memory.如果启动 4 个线程，开始处理第 1-4 层，当一个线程完成后，开始处理第 5 层，下一个第 6 层，依此类推，结果将或多或少以相同的顺序出现，我可以开始将结果写入该文件并从 memory 中丢弃它们。

Answer 1

Well if you want to order thread operations, Thread Synchronization 101 teaches us to use condition variables, and to implement those in C# tasks you can use a SemaphoreSlim which provides an async wait function, SemaphoreSlim.WaitAsync .好吧，如果您想订购线程操作，线程同步 101 教我们使用条件变量，并在 C# 任务中实现这些条件变量，您可以使用提供异步等待 function、 SemaphoreSlim的SemaphoreSlim.WaitAsync 。 That plus a counter check will get you the desired result.再加上计数器检查将为您提供所需的结果。

However I'm not convinced it's needed, because if I understand correctly and you just want to save them sequentially to avoid storing them in memory, you can use memory mapped files to either:但是我不相信它是必要的，因为如果我理解正确并且您只想按顺序保存它们以避免将它们存储在 memory 中，您可以使用 memory 映射文件到：

If the results have the same size, simply write your buffer at the location index * size .如果结果大小相同，只需将缓冲区写入位置index * size 。
If the results have different sizes, write to a temporary mapped file as you get your results, and have another thread copy the correct sequential output file as they come.如果结果大小不同，请在获得结果时写入临时映射文件，并让另一个线程在它们出现时复制正确的顺序 output 文件。 This is an IO bound operation, so don't use the task pool for it.这是一个 IO 绑定操作，所以不要为它使用任务池。

Answer 2

The Parallel class knows how to parallelize a workload, but doesn't know how to merge the processed results. Parallel class 知道如何并行化工作负载，但不知道如何合并处理后的结果。 So I would suggest to use PLINQ instead.所以我建议改用PLINQ 。 Your requirement of saving the results in the original order and concurrently with the processing, makes it a bit trickier than usual, but it is still perfectly doable:您需要以原始顺序保存结果并与处理同时进行，这使得它比平时有点棘手，但它仍然是完全可行的：

IEnumerable<object> results = Partitioner
    .Create(items, EnumerablePartitionerOptions.NoBuffering)
    .AsParallel()
    .AsOrdered()
    .WithMergeOptions(ParallelMergeOptions.NotBuffered)
    .Select((item, index) => DoStuff(item, index))
    .AsEnumerable();

foreach (object result in results)
{
    SequentiallySaveResult(result);
}

Explanation:解释：

The AsOrdered operator is required for retrieving the results in the original order. AsOrdered运算符是按原始顺序检索结果所必需的。
The WithMergeOptions operator is required for preventing the buffering of the results, so that they are saved as soon as they become available. WithMergeOptions运算符是防止结果缓冲所必需的，以便在结果可用时立即保存。
The Partitioner.Create is required because the source of data is an array, and PLINQ by default partitions arrays statically . Partitioner.Create是必需的，因为数据源是一个数组，而 PLINQ 默认对 arrays 进行静态分区。 Which means that the array is splitted in ranges, and one thread is allocated for processing each range.这意味着数组被分成多个范围，并分配一个线程来处理每个范围。 Which is a great performance optimization in general, but in this case it defeats the purpose of the timely and ordered retrieval of the results.一般来说，这是一个很好的性能优化，但在这种情况下，它违背了及时有序地检索结果的目的。 So a dynamic partitioner is needed, to enumerate the source sequentially from start to end.所以需要一个动态分区器，从头到尾依次枚举源。
The EnumerablePartitionerOptions.NoBuffering configuration prevents the worker threads employed by PLINQ from grabing more than one item at a time (which is the default PLINQ partitioning cleverness known as "chunk partitioning"). EnumerablePartitionerOptions.NoBuffering配置可防止 PLINQ 使用的工作线程一次抓取多个项目（这是默认的 PLINQ 分区技巧，称为“块分区”）。
The AsEnumerable is not really needed. AsEnumerable并不是真正需要的。 It is there just for signifying the end of the parallel processing.它只是为了表示并行处理的结束。 The foreach that follows treats the ParallelQuery<object> as IEnumerable<object> anyway.接下来的foreach将ParallelQuery<object>视为IEnumerable<object>无论如何。

Because of all of this trickery required, and because this solution is not really flexible enough in case you need later to add more concurrent heterogeneous steps in the processing pipeline, I would suggest to keep in mind the option of stepping up to the TPL Dataflow library.由于需要所有这些技巧，并且由于此解决方案不够灵活，以防您以后需要在处理管道中添加更多并发异构步骤，我建议记住升级到TPL 数据流库的选项. It is a library that unlocks lots of powerful options in the realm of parallel processing.它是一个库，可在并行处理的 realm 中解锁许多强大的选项。

如何在 parallel.for 中强制执行一系列有序执行？

问题描述

2 个解决方案

解决方案1
0 2020-06-30 15:08:46

解决方案2
0 2020-06-30 16:50:53

如何在 parallel.for 中强制执行一系列有序执行？

问题描述

2 个解决方案

解决方案1 0 2020-06-30 15:08:46

解决方案2 0 2020-06-30 16:50:53

解决方案1
0 2020-06-30 15:08:46

解决方案2
0 2020-06-30 16:50:53