简体   繁体   English

分析/优化大量多线程应用程序

[英]Profiling/optimising heavily multithreaded application

I'm writing a performance-critical .NET application which makes heavy use of multithreading. 我正在编写一个性能关键的.NET应用程序,它大量使用多线程。

Using the Visual Studio performance profiler, the top functions with Exclusive samples are: 使用Visual Studio性能分析器,使用Exclusive样本的顶级函数是:

WaitHandle.WaitAny() - 14.23% WaitHandle.WaitAny() - 14.23%

@JIT_MonReliableEnter@8 - 7.76% @JIT_MonReliableEnter@8 - 7.76%

Monitor.Enter - 5.09% Monitor.Enter - 5.09%

Basically, my top 3 functions are working with threading primitives and out of my control to some extent I believe. 基本上,我的前3个函数正在使用线程原语,并且在某种程度上我不相信。 My work/processing routines are pretty small in comparison and I'm trying to increase performance. 相比之下,我的工作/处理程序非常小,我正在努力提高性能。 I believe the algorithms involved are pretty sound, although I am reviewing them fairly frequently. 我相信所涉及的算法非常合理,尽管我经常审查它们。

My questions are: 我的问题是:

  • If there are 14.23% of CPU samples in these methods - is the CPU effectively 'idle' for most of those samples, ie just waiting on other threads? 如果这些方法中有14.23%的CPU样本 - 对于大多数样本来说CPU是否有效'空闲',即只等待其他线程? Or is the idle part of the thread-waits not shown as a part of the profile trace [and the 27.08% shown in these 3 the sum of all overhead within those sync methods]? 或者是线程的空闲部分 - 等待未显示为配置文件跟踪的一部分[并且这些同步方法中所有开销的总和中的27.08%]? (I can guess that this is mostly idle, but would appreciate some decent reference material behind answers to this one please) (我可以猜测这大部分都是空闲的,但请欣赏一些体面的参考资料,请回答这个问题)
  • I have reviewed my locking schemes, however do these results indicate some particular bottleneck or technique I should look into for further optimisation? 我已经审查了我的锁定方案,但这些结果是否表明我应该考虑进一步优化的一些特定瓶颈或技术?
  • Is WaitAny quite poor in particular? WaitAny特别差吗? I use it heavily to check whether particular queue objects are readable/writable, but also checking an abort flag at the same time. 我大量使用它来检查特定队列对象是否可读/可写,还同时检查中止标志。 Is there a better way to do that? 有没有更好的方法呢?

Your CPU isn't necessarily idle when a thread is in a WaitHandle.WaitAny or a Monitor.Enter . 当线程在WaitHandle.WaitAnyMonitor.Enter时,您的CPU不一定是空闲的。 A thread that's in a wait is idle, but presumably other threads are busy executing. 这是在等待一个线程处于空闲状态,但据推测其他线程都忙于执行。 This is especially true of Monitor.Enter . Monitor.Enter尤其如此。 If a thread is blocked on a lock, then one would hope the thread that has that lock is executing code rather than sitting idle. 如果某个线程在锁上被阻塞,那么就会希望拥有该锁的线程正在执行代码而不是空闲。

Also, if your thread is using the WaitAny to read from a queue, then it's likely that the queue simply doesn't have anything in it. 此外,如果您的线程使用WaitAny从队列中读取,那么队列中可能没有任何内容。 That's not a performance problem for the consumer code. 这不是消费者代码的性能问题。 It just means that the producer isn't putting things into the queue fast enough. 它只是意味着生产者没有足够快地将东西放入队列中。 Now, that might be because the producer is slow, or because data isn't coming in fast enough. 现在,这可能是因为生产者很慢,或者因为数据进展不够快。

If you're processing data faster than it can come in, then it doesn't look like you have a performance problem. 如果您处理数据的速度超过了它的速度,那么看起来您的性能问题就不那么严重了。 Certainly not on the consumer side. 当然不是消费者方面。

As far as using WaitAny for queuing, I would suggest that you use BlockingCollection and the methods that take a cancellation token, like TryAdd(T, Int32, CancellationToken) . 至于使用WaitAny排队,我建议您使用BlockingCollection和取消令牌的方法,如TryAdd(T,Int32,CancellationToken) Converting to cancellation tokens really simplified my multi-threaded queuing code. 转换为取消令牌确实简化了我的多线程排队代码。

The profiling statistics do not include the time when threads were blocked. 概要分析统计信息包括阻止线程的时间。

The sampling-based profiler basically asks each core to report back after every X (say 1,000,000) non-idle cycles. 基于采样的分析器基本上要求每个核心在每个X(比如1,000,000)非空闲周期之后报告。 Each time a core reports back, the profiler remembers the current call stack. 每次核心报告时,探查器都会记住当前的调用堆栈。 The profiling results are reconstructed from the call stacks that the profiler recorded. 从分析器记录的调用栈重建分析结果。

From the profiling results, you know that 14.23% of the time a core was doing work, it was executing the instructions in WaitHandle.WaitAny. 从分析结果中,您知道核心正在工作的时间占14.23%,它正在执行WaitHandle.WaitAny中的指令。 If your program is CPU-bound, optimizing the WaitAny part (eg, using a different primitive) could have a significant impact on the performance. 如果您的程序受CPU限制,优化WaitAny部分(例如,使用不同的原语)可能会对性能产生重大影响。 However, if the program is not CPU-bound and spends the majority of its time waiting on a server, disk, another process or some other external input, then optimizing the WaitAny-related code will not be very useful. 但是,如果程序不受 CPU约束并且大部分时间都在服务器,磁盘,另一个进程或其他外部输入上等待,那么优化WaitAny相关代码将不会非常有用。

So, your next step should be figuring out what is the CPU utilization of your program. 因此,您的下一步应该是弄清楚程序的CPU利用率是多少。 Also, note the Concurrency Visualizer that Ilian mentioned can be useful to understand how the threads in your program spend their time. 另外,请注意Ilian提到的Concurrency Visualizer可以帮助您理解程序中的线程如何花费时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM