简体   繁体   中英

Why does eight processes with 2 threads each create more load than one process with 16 threads?

I have a simple program which starts n threads and create some load on each thread. If i only start one thread, one core gets about 100% load. If i start one process with 16 threads(which means one thread per core), i only get about 80% load. If i start 8 processes with 2 threads(which still means one thread per core), i get about 99% load. I don't use any locking in this sample.

What is the reason for this behavior? I understand that the load goes down if there a 100 threads working because the OS has to schedule a lot. But in this case there are only as many threads as cores.

It is even worse(for me at least). If i add a simple thread.sleep(0) in my loop, the load with one process and 16 threads increase up to 95%.

Can anyone answer this, or provide a link with more information about this specific topic?

一个进程16个线程

八个进程2线程

一个进程16个线程与thread.sleep(0)

//Sample application which reads the number of threads to be started from Console.ReadLine
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Enter the number of threads to be started");
            int numberOfThreadsToStart;

            string input = Console.ReadLine();

            int.TryParse(input, out numberOfThreadsToStart);
            if(numberOfThreadsToStart < 1)
            {
                Console.WriteLine("No valid number of threads entered. Exit now");
                Thread.Sleep(1500);
                return;
            }

            List<Thread> threadList = new List<Thread>();
            Stopwatch sw = Stopwatch.StartNew();
            for (int i = 0; i < numberOfThreadsToStart; i++)
            {
                Thread workerThread = new Thread(MakeSomeLoad);
                workerThread.Start();
                threadList.Add(workerThread);
            }

            while (true)
            {
                Console.WriteLine("I'm spinning... ");
                Thread.Sleep(2000);
            }
        }

        static void MakeSomeLoad()
        {
            for (int i = 0; i < 100000000; i++)
            {

                for (int j = 0; j < i; j++)
                {
                    //uncomment the following line to increase the load
                    //Thread.Sleep(0);
                    StringBuilder sb = new StringBuilder();
                    sb.Append("hello world" + j);
                }
            }
        }
    }

Your test looks very GC heavy. If you have 16 threads in one process, the GC will run more in that process, and since the client GC isn't parallel, this leads to a lower load. ie you have 16 garbage producing threads per GC thread.

On the other hand if you run 8 processes with two threads each, you get only two threads producing garbage for each GC thread, and the GC can work in parallel between those processes.

If you write a test that produces less garbage, and uses more CPU directly, you will likely get different results.

(Note that this is only speculation, I didn't run your test, and since I only have a dual core CPU that would be different from your results anyways)

Something else to consider is that there are different modes to the garbage collector:

  • Server GC
  • Workstation GC - Concurrent (default execept for asp.net)
  • Workstation GC – Non Concurrent

You can find some of the graphic details of each here .

Since you process is using lots of threads and is allocating a whole lot of memory, you should try server GC.

The server GC is optimized for high throughput and high scalability in server applications where there is a consistent load and requests are allocating and deallocating memory at a high rate. The server GC uses one heap and one GC thread per processor and tries to balance the heaps as much as possible. At the time of a garbage collection, the GC threads work on their respective threads and rendez-vous at certain points. Since they all work on their own heaps, minimal locking etc. is needed which makes it very efficient in this type of situation.

You enable the Server CG in your App.config:

<configuration>
 <runtime>
   <gcServer enabled="true" />
 </runtime>
</configuration> 

Note that this will only work on a multi processor (or core) system. If windows reports only one processor then you will get Workstation GC – Non Concurrent instead.

Use something like Thread.SpinWait(int.MaxValue) to produce CPU load because your program mainly produces memory load, which may lead to effects like false sharing. As CodeInChaos already stated, the GC activity will also very likely impact performance.

As with the others I suspect this has something to do with the GC. The load example uses huge amounts of memory, by the end of the two for loops the StringBuilder objects will be asking for gigabyte sized arrays to store their data in.

There are a couple of reasons that the GC thread could slow the processing.

One is that as soon as the VM runs out of memory most of the threads will have to be suspended and wait for the GC to free up memory before they can continue (this is because all threads will be asking for more memory at approximately the same time during execution).

The second is to do with context switching of the threads (and this is likely the biggest reason). If thread A is running on core X runs out of memory then GC will either have to be loaded up on to core X or load out all of thread A's memory from core X's cache to the cache on the core it is running. Either way, the CPU will have to wait for its cache to be loaded with memory from RAM. RAM compared to a hard drive is fast, but compared to a CPU it is painstakingly slow. And whilst the CPU is waiting for the RAM to respond it cannot do any processing, thus reducing load.

When you have multiple VMs then each VM can run on its own core and not care about what the other VMs are up to. And when the GC gets invoked then there is no need for a context switch as the GC can just run on the same core as the other two threads on the VM.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM