How to enforce a sequence of ordered execution in parallel.for?

Question

I have a simple parallel loop doing stuff, and afterwards I save the results to a file.

object[] items; // array with all items
object[] resultArray = new object[numItems];
Parallel.For(0, numItems, (i) => 
{ 
    object res = doStuff(items[i], i);
    resultArray[i] = res;
});

foreach (object res in resultArray)
{
    sequentiallySaveResult(res);
}

For the saving, I need to write the results in the correct sequential order. By putting the results in the resultArray , the order of the results is correct again.

However, as the results are pretty big and take a lot of memory. I would like to process the items in-order, as in eg four threads start and work on items 1-4, next free thread takes item 5 and so on.

With that, I could start another Thread, monitoring the item that needs to be written next in the array (or each thread could emit an event when an item is finished), so I can already start writing the first results while the later items are still being processed and then free the memory.

Is it possible for Parallel.For to process the items in the given order? I of course I could use a concurentQueue , put all the indices in the right order in there and start threads manually.

But if possible, I would like to keep all the automations on how many threads to use etc. that are in the ´Parallel.For´ implementation.

Disclaimer: I cannot switch to an ForEach , I need the i .

EDIT #1:
Currently, the execution order is totally random, one example:

Processing item 1/255
Processing item 63/255
Processing item 32/255
Processing item 125/255
Processing item 94/255
Processing item 156/255
Processing item 187/255
Processing item 249/255
...

EDIT #2:
More details to the job that is done:

I process a grayscale image and need to extract information for each "layer" (items in the example above), so I go from 0 to 255 (for 8bit) and perform a task on the image.

I have a class to access the pixel values concurrently:

 unsafe class UnsafeBitmap : IDisposable
    {

        private BitmapData bitmapData;
        private Bitmap gray;
        private int bytesPerPixel;
        private int heightInPixels;
        private int widthInBytes;
        private byte* ptrFirstPixel;

        public void PrepareGrayscaleBitmap(Bitmap bitmap, bool invert)
        {
            gray = MakeGrayscale(bitmap, invert);

            bitmapData = gray.LockBits(new Rectangle(0, 0, gray.Width, gray.Height), ImageLockMode.ReadOnly, gray.PixelFormat);
            bytesPerPixel = System.Drawing.Bitmap.GetPixelFormatSize(gray.PixelFormat) / 8;
            heightInPixels = bitmapData.Height;
            widthInBytes = bitmapData.Width * bytesPerPixel;
            ptrFirstPixel = (byte*)bitmapData.Scan0;
        }

        public byte GetPixelValue(int x, int y)
        {
            return (ptrFirstPixel + ((heightInPixels - y - 1) * bitmapData.Stride))[x * bytesPerPixel];
        }

        public void Dispose()
        {
            gray.UnlockBits(bitmapData);
        }
    }

And the loop is

UnsafeBitmap ubmp; // initialized, has the correct bitmap
int numLayers = 255;
int bitmapWidthPx = 10000;
int bitmapHeightPx = 10000;
object[] resultArray = new object[numLayer];
Parallel.For(0, numLayers, (i) => 
{ 
        for (int x = 0; x < bitmapWidthPx ; x++)
    {
        inLine = false;
        for (int y = 0; y < bitmapHeightPx ; y++)
        {
            byte pixel_value = ubmp.GetPixelValue(x, y);
            
            if (i <= pixel_value && !inLine)
            {
                result.AddStart(x,y);
                inLine = true;
            }
            else if ((i > pixel_value || y == Height - 1) && inLine)
            {
                result.AddEnd(x, y-1);
                inLine = false;
            }
        }
    }
    result_array[i] = result;
});

foreach (object res in resultArray)
{
    sequentiallySaveResult(res);
}

And I would like to also start a thread for the saving, checking if the item that needs to be written next is available, write it, discard from memory. And for this, it would be good if the processing starts in order, so that the result arrive roughly in order. If the result for layer 5 arrives second to last, I have to wait writing layer 5 (and all following) until the end.

If 4 threads start, start processing layers 1-4, and when a thread is done, starts processing layer 5, next one layer 6 and so on, the results will come more or less in the same order and I can start writing result to the file and discarding them from memory.

Answer 1

Well if you want to order thread operations, Thread Synchronization 101 teaches us to use condition variables, and to implement those in C# tasks you can use a SemaphoreSlim which provides an async wait function, SemaphoreSlim.WaitAsync . That plus a counter check will get you the desired result.

However I'm not convinced it's needed, because if I understand correctly and you just want to save them sequentially to avoid storing them in memory, you can use memory mapped files to either:

If the results have the same size, simply write your buffer at the location index * size .
If the results have different sizes, write to a temporary mapped file as you get your results, and have another thread copy the correct sequential output file as they come. This is an IO bound operation, so don't use the task pool for it.

Answer 2

The Parallel class knows how to parallelize a workload, but doesn't know how to merge the processed results. So I would suggest to use PLINQ instead. Your requirement of saving the results in the original order and concurrently with the processing, makes it a bit trickier than usual, but it is still perfectly doable:

IEnumerable<object> results = Partitioner
    .Create(items, EnumerablePartitionerOptions.NoBuffering)
    .AsParallel()
    .AsOrdered()
    .WithMergeOptions(ParallelMergeOptions.NotBuffered)
    .Select((item, index) => DoStuff(item, index))
    .AsEnumerable();

foreach (object result in results)
{
    SequentiallySaveResult(result);
}

Explanation:

The AsOrdered operator is required for retrieving the results in the original order.
The WithMergeOptions operator is required for preventing the buffering of the results, so that they are saved as soon as they become available.
The Partitioner.Create is required because the source of data is an array, and PLINQ by default partitions arrays statically . Which means that the array is splitted in ranges, and one thread is allocated for processing each range. Which is a great performance optimization in general, but in this case it defeats the purpose of the timely and ordered retrieval of the results. So a dynamic partitioner is needed, to enumerate the source sequentially from start to end.
The EnumerablePartitionerOptions.NoBuffering configuration prevents the worker threads employed by PLINQ from grabing more than one item at a time (which is the default PLINQ partitioning cleverness known as "chunk partitioning").
The AsEnumerable is not really needed. It is there just for signifying the end of the parallel processing. The foreach that follows treats the ParallelQuery<object> as IEnumerable<object> anyway.

Because of all of this trickery required, and because this solution is not really flexible enough in case you need later to add more concurrent heterogeneous steps in the processing pipeline, I would suggest to keep in mind the option of stepping up to the TPL Dataflow library. It is a library that unlocks lots of powerful options in the realm of parallel processing.

How to enforce a sequence of ordered execution in parallel.for?

Question

2 answers

solution1
0 2020-06-30 15:08:46

solution2
0 2020-06-30 16:50:53

How to enforce a sequence of ordered execution in parallel.for?

Question

2 answers

solution1 0 2020-06-30 15:08:46

solution2 0 2020-06-30 16:50:53

solution1
0 2020-06-30 15:08:46

solution2
0 2020-06-30 16:50:53