简体   繁体   中英

Multiple Parallel.ForEach loops in .Net

In a.Net process, there is only one managed thread pool. We can set the minimum and maximum thread count as needed via public properties.

In.Net, we also have Parallel.ForEach that gets its threads from this managed thread pool under the hood.

In Parallel.ForEach we can also set the MaxDegreeOfParallelism to limit the maximum number of threads.

I have two Parallel.ForEach running in parrallel. One has MaxDegreeOfParallelism set to 3 and the other has set to 7.

My question is: Does both my Parallel.ForEach loops use the same thread pool under the hood. If yes, how does Parallel.ForEach limits the threads with MaxDegreeOfParallelism . How multiple Parallel.ForEach loops and one managed thread pool work together? It'll really help if you can provide a high level explanation or some pointers before I peak into the .net core source code.

  • Does both my Parallel.ForEach loops use the same thread pool under the hood.

    Yes

  • How does Parallel.ForEach limits the threads with MaxDegreeOfParallelism.

    ParallelOptions.MaxDegreeOfParallelism Gets or sets the maximum number of concurrent tasks enabled by this ParallelOptions instance.

    By default, methods on the Parallel class attempt to use all available processors, are non-cancelable, and target the default TaskScheduler (TaskScheduler.Default). ParallelOptions enables overriding these defaults.

  • How multiple Parallel.ForEach loops and one managed thread pool work together?

    They share the same thread pool. As it is described here :

    Generally, you do not need to modify this setting. However, you may choose to set it explicitly in advanced usage scenarios such as these:

    When you're running multiple algorithms concurrently and want to manually define how much of the system each algorithm can utilize. You can set a MaxDegreeOfParallelism value for each.

By default a Parallel.ForEach loop uses threads from the ThreadPool , which is a static class and it's only one per process . It is possible to modify this behavior by configuring the TaskScheduler property of the ParallelOptions . Creating a custom TaskScheduler that functions as an alternative ThreadPool is not exactly trivial, but not rocket science either. If you are interested, you can find here some material to help you get started ( article ).

Now what happens when two parallel loops are running concurrently, is that both are scheduling work on ThreadPool threads. If they are both configured with a specific MaxDegreeOfParallelism , and the sum of both does not exceed the minimum number of threads that the ThreadPool creates on demand¹, then the two loops are not going to interfere with each other regarding their scheduling. Of course it is still possible to compete with each other for CPU resources, in case these are scarce. In that case the operating system will be the arbiter.

In case at least one of the parallel loops is not configured with a specific MaxDegreeOfParallelism , the effective default of this option is -1 , which means unbounded parallelism. This will cause the ThreadPool to become immediately saturated, and to remain saturated until the source enumerable of the unconfigured parallel loop completes. During this period the two parallel loops will interfere heavily with each other, and who is going to get the extra thread that the saturated ThreadPool will inject every ~500 msec is a matter of who asked for it first. On top of that a saturated ThreadPool affects negatively any other independent callbacks, timer events, async continuations etc that may also be active during this period.

In case both parallel loops are configured, and the sum MaxDegreeOfParallelism of both exceeds the number of available threads, then it's a similar situation as previous. The only difference is that gradually the number of threads in the ThreadPool will increase, and the saturation incident may end up earlier than the execution of the parallel loops.

Below is an example that demonstrates this behavior:

ThreadPool.SetMinThreads(4, 4);
Task[] tasks = new[] { 'A', 'B' }.Select(name => Task.Run(() =>
{
    Thread.Sleep(100); if (name == 'B') Thread.Sleep(500);
    Print($"{name}-Starting");
    var options = new ParallelOptions() { MaxDegreeOfParallelism = 10 };
    Parallel.ForEach(Enumerable.Range(1, 10), options, item =>
    {
        Print($"{name}-Processing #{item}");
        Thread.Sleep(1000);
    });
    Print($"{name}-Finished");
})).ToArray();
Task.WaitAll(tasks);

static void Print(string line)
{
    Console.WriteLine($@"{DateTime.Now:HH:mm:ss.fff} [{Thread.CurrentThread
        .ManagedThreadId}] > {line}");
}

Output:

15:34:20.054 [4] > A-Starting
15:34:20.133 [6] > A-Processing #2
15:34:20.133 [7] > A-Processing #3
15:34:20.133 [4] > A-Processing #1
15:34:20.552 [5] > B-Starting
15:34:20.553 [5] > B-Processing #1
15:34:20.956 [8] > A-Processing #4
15:34:21.133 [4] > A-Processing #5
15:34:21.133 [7] > A-Processing #6
15:34:21.133 [6] > A-Processing #7
15:34:21.553 [5] > B-Processing #2
15:34:21.957 [8] > A-Processing #8
15:34:21.957 [9] > A-Processing #9
15:34:22.133 [4] > A-Processing #10
15:34:22.134 [7] > B-Processing #3
15:34:22.134 [6] > B-Processing #4
15:34:22.553 [5] > B-Processing #5
15:34:22.957 [8] > B-Processing #6
15:34:22.958 [9] > B-Processing #7
15:34:23.134 [4] > A-Finished
15:34:23.134 [4] > B-Processing #8
15:34:23.135 [7] > B-Processing #9
15:34:23.135 [6] > B-Processing #10
15:34:24.135 [5] > B-Finished

( Try it on Fiddle )

You can see that the parallel loop A utilizes initially 3 threads (the threads 4, 6 and 7), while the parallel loop B utilizes only the thread 5. At that point the ThreadPool is saturated. Around 500 msec later the new thread 8 is injected, and is taken by the A loop. The B loop still has only one thread. Another second later one more thread, the thread 9, is injected. This too goes for the loop A, setting the score at 5-1 in favor of the loop A. There is no politeness or courtesy in this battle. It's a wild competition for limited resources. If you expect to have more than one parallel loops running in parallel, make sure that all have their MaxDegreeOfParallelism option configured, and that the ThreadPool can create enough threads on demand to accommodate all of them.


Note: The above text describes the existing behavior of the static Parallel class (.NET 5). Parallelism achieved through PLINQ (the AsParallel LINQ operator) has not the same behavior in all aspects. Also in the future the Parallel class may get new methods with different defaults.

¹ It is configured by the method ThreadPool.SetMinThreads , and AFAIK by default is equal to Environment.ProcessorCount .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM