简体   繁体   English

客户端的.NET并发性能

[英].NET concurrency performance on the client side

I am writing a client-side .NET application which is expected to use a lot of threads. 我正在编写一个客户端.NET应用程序,预计会使用很多线程。 I was warned that .NET performance is very bad when it comes to concurrency. 我被警告说,在并发性方面.NET性能非常糟糕。 While I am not writing a real-time application, I want to make sure my application is scalable (ie allows many threads) and is somehow comparable to an equivalent C++ application. 虽然我不是在编写实时应用程序,但我想确保我的应用程序是可伸缩的(即允许多个线程)并且在某种程度上可以与同等的C ++应用程序相媲美。

What is your experience? 你有什么经历? What is a relevant benchmark? 什么是相关基准?

I threw together a quick-and-dirty benchmark in C# using a prime generator as a test. 我使用一个素数发生器作为测试,在C#中汇总了一个快速而肮脏的基准。 The test generates primes up to a constant limit (I chose 500000) using a simple Sieve of Eratosthenes implementation and repeats the test 800 times, parallelized over a specific number of threads, either using the .NET ThreadPool or standalone threads. 该测试使用简单的Eratosthenes Sieve实现生成质数达到常数限制(我选择500000)并重复测试800次,并使用.NET ThreadPool或独立线程在特定数量的线程上并行化。

The test was run on a Quad-Core Q6600 running Windows Vista (x64). 测试在运行Windows Vista(x64)的四核Q6600上运行。 This is not using the Task Parallel Library, just simple threads. 这不是使用任务并行库,只是简单的线程。 It was run for the following scenarios: 它针对以下场景运行:

  • Serial execution (no threading) 串行执行(无线程)
  • 4 threads (ie one per core), using the ThreadPool 使用ThreadPool 4个线程(即每个核心一个)
  • 40 threads using the ThreadPool (to test the efficiency of the pool itself) 使用ThreadPool 40个线程(用于测试池本身的效率)
  • 4 standalone threads 4个独立线程
  • 40 standalone threads, to simulate context-switching pressure 40个独立线程,用于模拟上下文切换压力

The results were: 结果是:

Test | Threads | ThreadPool | Time
-----+---------+------------+--------
1    | 1       | False      | 00:00:17.9508817
2    | 4       | True       | 00:00:05.1382026
3    | 40      | True       | 00:00:05.3699521
4    | 4       | False      | 00:00:05.2591492
5    | 40      | False      | 00:00:05.0976274

Conclusions one can draw from this: 结论可以从中得出:

  • Parallelization isn't perfect (as expected - it never is, no matter the environment), but splitting the load across 4 cores results in about 3.5x more throughput, which is hardly anything to complain about. 并行化并不完美(正如预期的那样 - 无论环境如何都是如此),但是将负载分成4个核心会导致吞吐量增加3.5倍,这几乎不值得抱怨。

  • There was negligible difference between 4 and 40 threads using the ThreadPool , which means that no significant expense is incurred with the pool, even when you bombard it with requests. 使用ThreadPool 4到40个线程之间的差异可以忽略不计,这意味着即使你用请求轰炸它,也不会对池产生大量费用。

  • There was negligible difference between the ThreadPool and free-threaded versions, which means that the ThreadPool does not have any significant "constant" expense; ThreadPool和自由线程版本之间的差异可以忽略不计,这意味着ThreadPool没有任何重要的“常量”开销;

  • There was negligible difference between the 4-thread and 40-thread free-threaded versions, which means that .NET doesn't perform any worse than one would expect it to with heavy context-switching. 4线程和40线程自由线程版本之间的差异可以忽略不计,这意味着.NET的执行速度不会超过人们对大量上下文切换的预期。

Do we even need a C++ benchmark to compare to? 我们甚至需要一个C ++基准来比较吗? The results are pretty clear: Threads in .NET are not slow. 结果非常清楚:.NET中的线程并不慢。 Unless you , the programmer, write poor multi-threading code and end up with resource starvation or lock convoys, you really don't need to worry. 除非 ,程序员,编写糟糕的多线程代码并最终导致资源匮乏或锁定车队,否则你真的不必担心。

With .NET 4.0 and the TPL and improvements to the ThreadPool , work-stealing queues and all that cool stuff, you have even more leeway to write "questionable" code and still have it run efficiently. 使用.NET 4.0和TPL以及ThreadPool改进,工作窃取队列和所有那些很酷的东西,你有更多的余地来编写“有问题”的代码并且仍然可以高效运行。 You don't get these features at all from C++. 你根本没有从C ++中获得这些功能。

For reference, here is the test code: 供参考,这是测试代码:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Threading;

namespace ThreadingTest
{
    class Program
    {
        private static int PrimeMax = 500000;
        private static int TestRunCount = 800;

        static void Main(string[] args)
        {
            Console.WriteLine("Test | Threads | ThreadPool | Time");
            Console.WriteLine("-----+---------+------------+--------");
            RunTest(1, 1, false);
            RunTest(2, 4, true);
            RunTest(3, 40, true);
            RunTest(4, 4, false);
            RunTest(5, 40, false);
            Console.WriteLine("Done!");
            Console.ReadLine();
        }

        static void RunTest(int sequence, int threadCount, bool useThreadPool)
        {
            TimeSpan duration = Time(() => GeneratePrimes(threadCount, useThreadPool));
            Console.WriteLine("{0} | {1} | {2} | {3}",
                sequence.ToString().PadRight(4),
                threadCount.ToString().PadRight(7),
                useThreadPool.ToString().PadRight(10),
                duration);
        }

        static TimeSpan Time(Action action)
        {
            Stopwatch sw = new Stopwatch();
            sw.Start();
            action();
            sw.Stop();
            return sw.Elapsed;
        }

        static void GeneratePrimes(int threadCount, bool useThreadPool)
        {
            if (threadCount == 1)
            {
                TestPrimes(TestRunCount);
                return;
            }

            int testsPerThread = TestRunCount / threadCount;
            int remaining = threadCount;
            using (ManualResetEvent finishedEvent = new ManualResetEvent(false))
            {
                for (int i = 0; i < threadCount; i++)
                {
                    Action testAction = () =>
                    {
                        TestPrimes(testsPerThread);
                        if (Interlocked.Decrement(ref remaining) == 0)
                        {
                            finishedEvent.Set();
                        }
                    };

                    if (useThreadPool)
                    {
                        ThreadPool.QueueUserWorkItem(s => testAction());
                    }
                    else
                    {
                        ThreadStart ts = new ThreadStart(testAction);
                        Thread th = new Thread(ts);
                        th.Start();
                    }
                }
                finishedEvent.WaitOne();
            }
        }

        [MethodImpl(MethodImplOptions.NoOptimization)]
        static void IteratePrimes(IEnumerable<int> primes)
        {
            int count = 0;
            foreach (int prime in primes) { count++; }
        }

        static void TestPrimes(int testRuns)
        {
            for (int t = 0; t < testRuns; t++)
            {
                var primes = Primes.GenerateUpTo(PrimeMax);
                IteratePrimes(primes);
            }
        }
    }
}

And here is the prime generator: 这里是素数发生器:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ThreadingTest
{
    public class Primes
    {
        public static IEnumerable<int> GenerateUpTo(int maxValue)
        {
            if (maxValue < 2)
                return Enumerable.Empty<int>();

            bool[] primes = new bool[maxValue + 1];
            for (int i = 2; i <= maxValue; i++)
                primes[i] = true;

            for (int i = 2; i < Math.Sqrt(maxValue + 1) + 1; i++)
            {
                if (primes[i])
                {
                    for (int j = i * i; j <= maxValue; j += i)
                        primes[j] = false;
                }
            }

            return Enumerable.Range(2, maxValue - 1).Where(i => primes[i]);
        }
    }
}

If you see any obvious flaws in the test, let me know. 如果您在测试中发现任何明显的缺陷,请告诉我。 Barring any serious problems with the test itself, I think the results speak for themselves, and the message is clear: 除非测试本身出现任何严重问题,我认为结果不言自明,而且信息很明确:

Don't listen to anyone who makes overly broad and unqualified statements about how the performance of .NET or any other language/environment is "bad" in some particular area, because they are probably talking out of their... rear ends. 不要聆听那些对.NET或任何其他语言/环境的性能如何在某个特定领域“糟糕”做出过于宽泛和无条件陈述的人,因为他们可能正在谈论他们的......后端。

You may want to have a look at System.Threading.Tasks introduced in .NET 4. 您可能想看看.NET 4中引入的System.Threading.Tasks

They introduced a scalable way to use threads with task with some really cool mechanism of job sharing. 他们介绍了一种可扩展的方式,将线程与任务结合使用,并采用了一些非常酷的作

By the way I don't know who told you that .NET was not good with concurrency. 顺便说一下,我不知道是谁告诉你.NET并不适合并发。 All of my applications do use threads at some point of another but don't forget that having 10 threads on a 2 core processor is kind of counter productive (depending on the type of task you're making them do. If it's tasks that are waiting for networks ressources then it may make sense). 我的所有应用程序确实在另一个应用程序的某个位置使用线程,但不要忘记在2核处理器上有10个线程会产生相反的效果(取决于你正在做的任务的类型。如果它的任务是等待网络资源然后它可能有意义)。

Anyway, don't fear .NET for performance, it's actually quite good. 无论如何,不​​要害怕.NET的性能,它实际上非常好。

This is a myth. 这是一个神话。 .NET does a very good job at managing concurrency, and being very scalable. .NET在管理并发性方面做得非常好,并且具有很高的可扩展性。

If you can, I'd recommend using .NET 4 and the Task Parallel Library. 如果可以,我建议使用.NET 4和任务并行库。 It simplifies many concurrency issues. 它简化了许多并发问题。 For details, I'd recommend looking at the MSDN center for Parallel Computing with Managed Code . 有关详细信息,我建议您查看带有托管代码的并行计算的MSDN中心。

If you're interested in details of implementation, I also have a very detailed series on Parallelism in .NET . 如果您对实现的细节感兴趣,我还有一个关于.NET中Parallelism的非常详细的系列文章。

.NET performance on concurrency is going to be pretty close to the same as applications written in native code. 并发性的.NET性能与使用本机代码编写的应用程序非常接近。 System.Threading is a very thin layer over the threading API. System.Threading是线程API上的一个非常薄的层。

Whoever warned you may be noticing that, because multithreaded applications are much easier to write in .NET, they're sometimes being written by less experienced programmers who don't fully understand concurrency, but that's not a technical limitation. 谁警告过你可能会注意到,因为多线程应用程序在.NET中更容易编写,它们有时是由经验不足的程序员编写的,他们并不完全理解并发性,但这不是技术限制。

If anecdotal evidence helps, at my last job, we wrote a heavily concurrent trading application that processed over 20,000 market data events per second and updated a massive "main form" grid with the relevant data, all through a fairly massive threading architecture and all in C# and VB.NET. 如果轶事证据有所帮助,在我上一份工作中,我们编写了一个大量并行的交易应用程序,每秒处理超过20,000个市场数据事件,并通过相当大的线程架构更新了大量的“主要形式”网格,所有这些都在C#和VB.NET。 Because of the complexity of the application, we optimized many areas, but never saw an advantage to rewriting the threading code in native C++. 由于应用程序的复杂性,我们优化了许多领域,但从未看到在本机C ++中重写线程代码的优势。

First you should seriously reconsider whether or not you need a lot of threads or just some. 首先,您应该认真考虑是否需要大量线程或仅需要一些线程。 It's not that .NET threads are slow. 并不是.NET线程很慢。 Threads are slow. 线程很慢。 Task switching is an expensive operation no matter who wrote the algorithm. 无论谁编写算法,任务切换都是一项昂贵的操作。

This is a place, like many others, where design patterns can help. 与许多其他地方一样,这是一个设计模式可以提供帮助的地方。 There are already good answers that touch on this fact, so I'll just make it explicit. 已经有很好的答案触及了这个事实,所以我只是说明一点。 You are better off using a command pattern to marshal work into a few worker threads and then getting that work done as quickly as possible in sequence than you are trying to spin up a bunch of threads and do a bunch of work in "parallel" that isn't really being done in parallel but, rather, divided up into little chunks that are woven together by the scheduler. 你最好使用命令模式将工作编组到一些工作线程中,然后按顺序尽可能快地完成工作,而不是试图启动一堆线程并在“并行”中执行大量工作并不是真正并行完成,而是分成由调度程序编织在一起的小块。

In other words: you are better off dividing the work into chunks of value using your mind and knowledge to decide where the boundaries between units of value live than you are letting some generic solution like the operating system decide for you. 换句话说:您最好使用您的思想和知识将工作划分为大块的价值,以决定价值单元之间的界限在哪里,而不是像操作系统那样的通用解决方案为您决定。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM