简体   繁体   English

system.currentTimeMillis()导致系统CPU使用率过高

[英]High System CPU usage because of system.currentTimeMillis()

I was debugging high System CPU usage (Not user CPU usage) on of our storm supervisors (Wheezy machine). 我正在我们的风暴监督员(Wheezy机器)上调试高系统CPU使用率(非用户CPU使用率)。 Here are the observations 以下是观察结果

Output of perf for the relevant process: 相关过程的输出输出:

Events: 10K cpu-clock
16.40%  java  [kernel.kallsyms]   [k] system_call_after_swapgs
13.95%  java  [kernel.kallsyms]   [k] pvclock_clocksource_read
12.76%  java  [kernel.kallsyms]   [k] do_gettimeofday
12.61%  java  [vdso]              [.] 0x7ffe0fea898f
 9.02%  java  perf-17609.map      [.] 0x7fcabb8b85dc
 7.16%  java  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
 4.97%  java  [kernel.kallsyms]   [k] native_read_tsc
 2.88%  java  [kernel.kallsyms]   [k] sys_gettimeofday
 2.82%  java  libjvm.so           [.] os::javaTimeMillis()
 2.39%  java  [kernel.kallsyms]   [k] arch_local_irq_restore

Caught this in strace of a thread of the relevant process 在相关过程的线程中捕获了这一点

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000247           0     64038           gettimeofday
  0.00    0.000000           0         1           rt_sigreturn
  0.00    0.000000           0         1           futex
------ ----------- ----------- --------- --------- ----------------
100.00    0.000247                 64040           total

Finally figured out that the thread was running in while(true) and one of the calls inside was System.currentTimeMillis() . 最后发现线程在while(true)运行,其中一个调用是System.currentTimeMillis() I disabled the same and the system CPU % went down from 50% to 3%. 我禁用了同样的功能,系统CPU%从50%下降到3%。 So clearly that was the issue. 很清楚这就是问题所在。 What I fail to understand is, in the presence of vDSO these kernel calls should only happen in the user's address space. 我无法理解的是,在存在vDSO的情况下,这些内核调用应该只发生在用户的地址空间中。 But as is clear from perf report, kernel calls are indeed taking place in the kernel space. 但是从perf报告中可以清楚地看到,内核调用确实发生在内核空间中。 Any pointers on this? 有关于此的任何指示? Kernel version: 3.2.0-4-amd64 Debian 3.2.86-1 x86_64 GNU/Linux 内核版本:3.2.0-4-amd64 Debian 3.2.86-1 x86_64 GNU / Linux
clock type: kvm 时钟类型:kvm

Adding code of the problematic thread. 添加有问题的线程的代码。

@RequiredArgsConstructor
public class TestThread implements Runnable {
    private final Queue<String> queue;
    private final Publisher publisher;
    private final int maxBatchSize;

    private long lastPushTime;
    @Override
    public void run() {
        lastPushTime = System.currentTimeMillis();
        List<String> events = new ArrayList<>();
        while (true) {
            try {
                String message = queue.poll();
                long lastPollTime = System.currentTimeMillis();
                if (message != null) {
                    events.add(message);
                    pushEvents(events, false);
                }

                // if event threshold hasn't reached the size, but it's been there for over 10seconds, push it.
                if ((lastPollTime - lastPushTime > 10000) && (events.size() > 0)) {
                    pushEvents(events, true);
                }
            } catch (Exception e) {
                // Log and do something
            }
        }
    }

    private void pushEvents(List<String> events, boolean forcePush) {
        if (events.size() >= maxBatchSize || forcePush) {
            pushToHTTPEndPoint(events);
            events.clear();
            lastPushTime = System.currentTimeMillis();
        }
    }

    private void pushToHTTPEndPoint(List<String> events) {
        publisher.publish(events);
    }
}

There is nothing else of note inside the loop, so you are spinning on System.currentTimeMillis() 循环中没有别的注意事项,所以你在System.currentTimeMillis()上旋转

vDSO will help improve the performance of System.currentTimeMillis() , but does it really change the classification of the CPU from "System" to "User"? vDSO将有助于提高System.currentTimeMillis()的性能,但是它真的会将CPU的分类从“System”更改为“User”吗? I don't know, sorry. 我不知道,对不起。

This thread is going to be consuming 100% CPU, does it make a lot of difference whether it is classified as "System" or "User"? 这个线程将消耗100%的CPU,它是否被归类为“系统”或“用户”会有很大的不同吗?

You should rewrite this code to use a non-spin wait, for example BlockingQueue.poll(timeout) 您应该重写此代码以使用非旋转等待,例如BlockingQueue.poll(timeout)

What is your actual question here? 你这里的实际问题是什么?

What I fail to understand is, in the presence of vDSO these kernel calls should only happen in the user's address space. 我无法理解的是,在存在vDSO的情况下,这些内核调用应该只发生在用户的地址空间中。 But as is clear from perf report, kernel calls are indeed taking place in the kernel space. 但是从perf报告中可以清楚地看到,内核调用确实发生在内核空间中。 Any pointers on this? 有关于此的任何指示?

Why does it matter how the CPU time spent inside this spin-lock is classified? 为什么在这个自旋锁中花费的CPU时间是如何分类的呢?

According to User CPU time vs System CPU time? 根据用户CPU时间与系统CPU时间的关系? the "System CPU Time" is: “系统CPU时间”是:

System CPU Time: Amount of time the processor worked on operating system's functions connected to that specific program. 系统CPU时间:处理器处理连接到该特定程序的操作系统功能的时间。

By that definition, time spent spinning on the System.currentTimeMillis() would count as System time, even if it did not require a user-to-kernel mode switch due to vDSO. 根据该定义,在System.currentTimeMillis()上旋转所花费的时间将计为系统时间,即使它不需要由于vDSO而进行用户到内核模式切换。

by reading your code, there is no control code to block the while loop, except publisher.publish(events) and queue.poll() ,that means this thread is busy in while loop,never take a break. 通过读取代码,没有控制代码来阻止while循环,除了publisher.publish(events)queue.poll() ,这意味着这个线程忙于while循环,永远不会休息。

in my opinion,you need to limit the calls on System.currentTimeMillis() .a good choice is make queue.poll() blocking,some pseudocode : 在我看来,你需要限制对System.currentTimeMillis()的调用。一个很好的选择是make queue.poll()阻塞,一些伪代码:

while (!stopWork) {
    try {
        // wait for messages with 10 seconds timeout,if no message or timeout return empty list
        // this is easy to impl with BlockingQueue
        List<String> events = queue.poll(10,TimeUnit.SECOND);
        if (events.isEmpty()) {
            continue;
        }
        new java.util.Timer().schedule( 
            new java.util.TimerTask() {
                @Override
                public void run() {
                    pushEvents(events, true);
                }
            }, 1000*10 );
    } catch (Exception e) {
        // Log and do something
    }
}

What I fail to understand is, in the presence of vDSO these kernel calls should only happen in the user's address space. 我无法理解的是,在存在vDSO的情况下,这些内核调用应该只发生在用户的地址空间中。 But as is clear from perf report, kernel calls are indeed taking place in the kernel space. 但是从perf报告中可以清楚地看到,内核调用确实发生在内核空间中。 Any pointers on this? 有关于此的任何指示?

vDSO could be disabled on a virtual system. 可以在虚拟系统上禁用vDSO。 KVM uses PVClock (you could read more about in this nice article ) and it depends on kernel version. KVM使用PVClock(你可以在这篇好文章中阅读更多内容),这取决于内核版本。 For example, we could see here that VCLOCK_MODE is never overridden. 例如,我们可以在这里看到VCLOCK_MODE永远不会被覆盖。 On the other hand, here it is changed vclock_mode - and vclock_mode indicator for vDSO too. 另一方面, 这里也改变了vclock的vclock_mode和vclock_mode 指示符

This support was introduced in this commit and released in 3.8 version of Linux kernel. 此支持在此提交中引入,并在3.8版本的Linux内核中发布。

Generally, in my practice, if you call something inside "while(true)" for a long time, you will always see a big CPU consumption. 一般来说,在我的实践中,如果你长时间在“while(true)”内部调用某些东西,你总会看到很大的CPU消耗。

Of course, Blocking Queue is enough in most cases, but if you need good latency and performance, you could use spinning too, without thread blocking, but you should limit spin cycles and make benchmarks to measure the impact of this optimization. 当然,在大多数情况下阻塞队列就足够了,但是如果你需要良好的延迟和性能,你也可以使用旋转,没有线程阻塞,但你应该限制旋转周期并制定基准来衡量这种优化的影响。 The meta code could be something like: 元代码可能是这样的:

int spin = 100;
while(spin-- > 0) {
    // try to get result
}
// still no result -> execute blocking code

So I figured out the issue here. 所以我在这里找出了问题。 To give more context, the question was more about the fact that vDSO making system calls(Apologies if the original post was misleading!). 为了提供更多背景信息,问题更多的是vDSO进行系统调用的事实(如果原始帖子具有误导性,则道歉!)。 The clock source for this kernel version (kvmclock) didn't have support for virtual system calls and hence real system calls were happening. 此内核版本(kvmclock)的时钟源不支持虚拟系统调用,因此发生了真正的系统调用。 It was introduced in this commit https://github.com/torvalds/linux/commit/3dc4f7cfb7441e5e0fed3a02fc81cdaabd28300a#diff-5a34e1e52f50e00cef4b0d0ff3fef8f7 (Thanks to egorlitvinenko for pointing this out. 它在此提交中引入https://github.com/torvalds/linux/commit/3dc4f7cfb7441e5e0fed3a02fc81cdaabd28300a#diff-5a34e1e52f50e00cef4b0d0ff3fef8f7 (感谢egorlitvinenko指出这一点。

Also, I do understand that anything in while(true) will consume CPU. 另外,我确实理解while(true)中的任何内容都会占用CPU。 Since this was in apache storm context where the call was to essentially batch events before making HTTP call, this could've been done in a better way by using tick tuples support of apache storm. 由于这是在apache风暴上下文中,调用本质上是在进行HTTP调用之前批处理事件,因此可以通过使用apache storm的tick元组支持以更好的方式完成。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM