简体   繁体   中英

Profiling/optimising heavily multithreaded application

I'm writing a performance-critical .NET application which makes heavy use of multithreading.

Using the Visual Studio performance profiler, the top functions with Exclusive samples are:

WaitHandle.WaitAny() - 14.23%

@JIT_MonReliableEnter@8 - 7.76%

Monitor.Enter - 5.09%

Basically, my top 3 functions are working with threading primitives and out of my control to some extent I believe. My work/processing routines are pretty small in comparison and I'm trying to increase performance. I believe the algorithms involved are pretty sound, although I am reviewing them fairly frequently.

My questions are:

  • If there are 14.23% of CPU samples in these methods - is the CPU effectively 'idle' for most of those samples, ie just waiting on other threads? Or is the idle part of the thread-waits not shown as a part of the profile trace [and the 27.08% shown in these 3 the sum of all overhead within those sync methods]? (I can guess that this is mostly idle, but would appreciate some decent reference material behind answers to this one please)
  • I have reviewed my locking schemes, however do these results indicate some particular bottleneck or technique I should look into for further optimisation?
  • Is WaitAny quite poor in particular? I use it heavily to check whether particular queue objects are readable/writable, but also checking an abort flag at the same time. Is there a better way to do that?

Your CPU isn't necessarily idle when a thread is in a WaitHandle.WaitAny or a Monitor.Enter . A thread that's in a wait is idle, but presumably other threads are busy executing. This is especially true of Monitor.Enter . If a thread is blocked on a lock, then one would hope the thread that has that lock is executing code rather than sitting idle.

Also, if your thread is using the WaitAny to read from a queue, then it's likely that the queue simply doesn't have anything in it. That's not a performance problem for the consumer code. It just means that the producer isn't putting things into the queue fast enough. Now, that might be because the producer is slow, or because data isn't coming in fast enough.

If you're processing data faster than it can come in, then it doesn't look like you have a performance problem. Certainly not on the consumer side.

As far as using WaitAny for queuing, I would suggest that you use BlockingCollection and the methods that take a cancellation token, like TryAdd(T, Int32, CancellationToken) . Converting to cancellation tokens really simplified my multi-threaded queuing code.

The profiling statistics do not include the time when threads were blocked.

The sampling-based profiler basically asks each core to report back after every X (say 1,000,000) non-idle cycles. Each time a core reports back, the profiler remembers the current call stack. The profiling results are reconstructed from the call stacks that the profiler recorded.

From the profiling results, you know that 14.23% of the time a core was doing work, it was executing the instructions in WaitHandle.WaitAny. If your program is CPU-bound, optimizing the WaitAny part (eg, using a different primitive) could have a significant impact on the performance. However, if the program is not CPU-bound and spends the majority of its time waiting on a server, disk, another process or some other external input, then optimizing the WaitAny-related code will not be very useful.

So, your next step should be figuring out what is the CPU utilization of your program. Also, note the Concurrency Visualizer that Ilian mentioned can be useful to understand how the threads in your program spend their time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM