简体繁体 English

测量无等待多线程java程序中的争用

[英]Measure contention in wait-free multi-threaded java programs

原文 2014-05-13 11:56:38 8 2 java/ multithreading/ performance/ contention

I have a wait-free implementation for binary search trees but I am not able to figure out concrete methods to measure thread contention. 我有二进制搜索树的无等待实现，但我无法找出测量线程争用的具体方法。 By contention, here I mean number of threads that try to access the same piece of memory at the same time. 通过争用，这里我的意思是尝试同时访问同一块内存的线程数。

So far, I have searched ThreadMXBean and ThreadInfo class, but as there are no locks involved, I haven't found any solution yet. 到目前为止，我已经搜索了ThreadMXBean和ThreadInfo类，但由于没有涉及锁，我还没有找到任何解决方案。

2 个解决方案

There is no way to measure the contention over "memory location" without prohibitive performance costs. 如果没有过高的性能成本，就无法衡量“内存位置”的争用。 Direct measurement (eg properly synchronized counter wrapping all the accesses) will introduce the artificial bottlenecks, which will blow up test reliability. 直接测量（例如，正确同步的计数器包裹所有访问）将引入人为瓶颈，这将破坏测试可靠性。

"Same time" is loosely defined on the scale you want to measure it, because only a single CPU "owns" the particular location in memory in a given time. 在您想要测量它的比例上松散地定义“同一时间”，因为在给定时间内只有一个CPU“拥有”内存中的特定位置。 The best you can do in this case it to measure the rate at which CPUs are dealing with memory conflicts, eg through the HW counters. 在这种情况下，您可以做的最好的事情是测量CPU处理内存冲突的速率，例如通过HW计数器。 Doing that requires the understanding of memory subsystem on a given platfom. 这样做需要了解给定平台上的内存子系统。 Also, the HW counters attribute for machine (= CPU) state, not the memory state; 此外，HW计数器属性为机器（= CPU）状态，而不是内存状态; in other words, you can estimate how many conflicts the particular instructions have experienced, not how many CPUs accessed the given memory location. 换句话说，您可以估计特定指令经历的冲突数，而不是访问给定内存位置的CPU数量。

Trying the measure within the source of the contention is the wrong approach. 在争用源中尝试该措施是错误的方法。 What might be the reason for contention anyways?! 什么可能是争论的原因？！

So, first of all, setup a benchmarking suite which runs typical access patterns on your data structure and graph the performance for different thread counts. 因此，首先，设置一个基准测试套件，在您的数据结构上运行典型的访问模式，并绘制不同线程数的性能图。 Here is a nice example from nitro cache performance page . 这是nitro缓存性能页面的一个很好的例子。

If you scale almost linear: congrats, you are done! 如果你几乎是线性的：祝贺，你完成了！

If you don't scale linear, you need more insight. 如果不进行线性扩展，则需要更多洞察力。 Now you need to profile the system as a whole and see what is the reason eg for CPU pipeline stalls. 现在您需要将系统整体分析，看看是什么原因，例如CPU管道停顿。 The best way is to use low-overhead tracing for this. 最好的方法是使用低开销跟踪。 On Linux you can use OProfile . 在Linux上，您可以使用OProfile 。 OProfile has also Java support, which helps you to correlate the JITed machine code to your Java program. OProfile还提供Java支持，可帮助您将JITed机器代码与Java程序相关联。