简体   繁体   English

使用Dropwizard指标报告JVM的CPU使用情况

[英]Reporting JVM's CPU usage with Dropwizard metrics

I use Dropwizard metrics to measure various metrics in my application. 我使用Dropwizard指标来衡量我的应用程序中的各种指标。 They are several predefined reporters in JVM instrumentation , but strangely I could not find any reporting the CPU usage. 它们是JVM工具中的几个预定义报告 ,但奇怪的是我找不到任何报告CPU使用情况。

I could create my own Gauge (using getThreadCpuTime or similar), but my best guess is that I am missing something. 我可以创建自己的Gauge(使用getThreadCpuTime或类似),但我最好的猜测是我错过了一些东西。

Did I miss it in the current implementation, or is it more complex than I first think ? 我是否在当前的实施中错过了它,还是比我初想的更复杂?

I don't know much about Dropwizard, but I've used ThreadMXBean in the past to provide estimates of CPU utilization in scalable distributed computing systems so I'll share what I think is relevant to the question. 我对Dropwizard了解不多,但我过去曾使用ThreadMXBean来提供可扩展分布式计算系统中CPU利用率的估算,因此我将分享我认为与该问题相关的内容。 Things are definitely more complicated than they may first appear: 事情肯定比他们第一次看起来更复杂:

ThreadMxBean is somewhat misleading ... ThreadMxBean有点误导......

ThreadMxBean.getThreadCpuTime(id) only returns the total time that a particular thread has spent executing code on the CPU, measured in nanoseconds, since the thread started. ThreadMxBean.getThreadCpuTime(id)仅返回自线程启动以来特定线程在CPU上执行代码所花费的时间(以纳秒为单位)。 It provides no information on how long your thread may have been blocked or waited (sleeping), so it really doesn't give you a good idea of CPU usage. 它没有提供有关您的线程可能被阻塞或等待(休眠)多长时间的信息,因此它并不能让您对CPU使用情况有所了解。 You need to also measure total blocked/waited time, and then keep track of all three of those values over the runtime of your program to track CPU usage. 您还需要测量总阻塞/等待时间,然后在程序运行时间内跟踪所有这三个值以跟踪CPU使用情况。 Oddly enough, ThreadMXBean has no methods to directly obtain blocked/waited time, so you may be tempted to give up. 奇怪的是, ThreadMXBean没有直接获取阻塞/等待时间的方法,所以你可能会想要放弃。

... but you can use it to get a ThreadInfo object ... ...但你可以用它来获取一个ThreadInfo对象......

First, to enable this, call these two lines (this may throw an exception if your JVM doesn't support it): 首先,要启用它,请调用这两行(如果您的JVM不支持,则可能会抛出异常):

ManagementFactory.getThreadMXBean().setThreadCpuTimeEnabled(true);
ManagementFactory.getThreadMXBean().setThreadContentionMonitoringEnabled(true);

Now you can call ThreadMXBean.getThreadInfo(threadId) to get an instance of ThreadInfo corresponding to a particular thread. 现在,您可以调用ThreadMXBean.getThreadInfo(threadId)来获取与特定线程对应的ThreadInfo实例。 This info object has two methods, getBlockedTime() and getWaitedTime() , which return the total number of milliseconds your thread has spent in either of those states. 此info对象有两个方法getBlockedTime()getWaitedTime() ,它们返回线程在这两种状态中花费的总毫秒数。 There is no getCpuTime() method (which, if you ask me, is a tremendously silly shortcoming of this object), but if you know when your thread was started, you can do something like this: 没有getCpuTime()方法(如果你问我,这个对象是一个非常愚蠢的缺点),但如果你知道你的线程何时启动,你可以这样做:

//Initialized somewhere else:
ThreadMXBean bean = ...
long threadStartTime = System.currentTimeMillis();
Thread myThread = ...

//Inside your metrics-gathering code:
long now = System.currentTimeMillis();
ThreadInfo info = bean.getThreadInfo(myThread.getId());
long totalCpuTime = now - (info.getBlockedTime()+info.getWaitedTime()+threadStartTime);

Now you can compute Thread utilization as a percentage. 现在,您可以按百分比计算线程利用率。

We're almost there, but we're not quite done yet. 我们差不多了,但我们还没完成。 Each time we go through the final three lines of the code I posted above, we're only gathering total times for executing/blocked/waiting states of our thread. 每次我们浏览上面发布的代码的最后三行时,我们只收集执行/阻塞/等待状态的总时间。 To compute a percentage, we need to keep track of when we gathered these metrics so we can know how much time the thread spent in each of those states since the last metrics update. 要计算百分比,我们需要跟踪收集这些指标的时间,以便我们知道自上次指标更新以来线程在每个状态中花费了多少时间。 So, do something like this: 所以,做这样的事情:

class ThreadUsageMetrics{
    long timestamp, totalBlockedTime, totalWaitTime;

    ThreadUsageMetrics(long ts, long blocked, long wait){
        timestamp = ts;
        totalBlockedTime = blocked;
        totalWaitTime = wait;
    }

    double computeCpuUsageSince(ThreadUsageMetrics prev){
        long time = timestamp - prev.timestamp;
        long blocked = totalBlockedTime - prev.totalBlockedTime;
        long waited = totalWaitTime - prev.totalWaitTime;
        return (time-(blocked+waited))/(double)time;
    }
}

This will give us a double on the range from 0.0 to 1.0 indicating CPU usage as a percentage of total time since the last metrics update. 这将使我们在0.0到1.0的范围内加倍,表示CPU使用率占自上次度量标准更新以来总时间的百分比。 I'm assuming you can convert this value into a percentage and feed it to an instance of Dropwizard's Gauge every 5 seconds or so. 我假设您可以将此值转换为百分比,并每隔5秒左右将其提供给Dropwizard的Gauge实例。 On my project, this is how we have estimated CPU usage for several years and it's worked great for us. 在我的项目中,这就是我们几年来估计CPU使用率的方式,它对我们来说非常有用。

A couple of notes on this - We don't actually need to explicitly store total CPU time in this object because any time not spent blocking or waiting is either execution time, or spent during context switching. 关于此的几点注意事项 - 我们实际上并不需要在此对象中明确存储总CPU时间,因为任何时间都没有花费在阻塞或等待上的时间是执行时间,或者是在上下文切换期间花费的时间。 We have no way to know context switch time, but it's safe to assume that total context switching time is negligible for 99.9% of all cases. 我们无法知道上下文切换时间,但可以安全地假设在99.9%的情况下总上下文切换时间可以忽略不计。

Here's the caveat - we aren't truly measuring CPU usage. 这里有一点需要注意 - 我们并没有真正衡量CPU使用率。

If you've read carefully, you'll notice I've said we're "estimating" CPU usage. 如果您仔细阅读,您会注意到我说我们正在“估算”CPU使用率。 The reason I say this is that we're measuring total execution time of a particular Java Thread . 我这说的原因是我们正在测量特定Java Thread总执行时间。 Java provides no concept of actual CPU hardware usage - it's merely the total time a thread has spent executing. Java没有提供实际CPU硬件使用的概念 - 它只是线程执行所花费的总时间。 This is further muddied by things like Hyper Threading, where time spent "executing" may actually mean time spent waiting for the other thread to get off the ALU or memory bus. 超级线程之类的东西更加混乱,“执行”所花费的时间实际上意味着等待另一个线程从ALU或内存总线上下来所花费的时间。 I think this provides a good measure of when code is running on a physical hardware Thread, but if you're wanting to measure actual CPU usage, you won't be able to do it in pure Java. 我认为这可以很好地衡量代码在物理硬件线程上运行的时间,但是如果您想要测量实际的CPU使用率,那么您将无法在纯Java中执行此操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM