如何在 docker 运行压力 ng 时使用 perf 工具？

Question

我正在使用https://hub.docker.com/r/polinux/stress-ng/dockerfile中的 stress-ng docker 图像来强调我的系统。 我想使用 perf 工具来监控指标。

perf stat -- stress-ng --cpu 2 --timeout 10运行 stress-ng 10 秒并返回性能指标。 我尝试使用perf stat -- docker run -ti --rm polinux/stress-ng --cpu 2 --timeout 10对 docker 映像执行相同的操作。 这会返回指标，但不会返回压力-ng 的指标。

output 在stress-ng上使用“perf stat”时得到：

Performance counter stats for 'stress-ng --cpu 2 --timeout 10':

  19975.863889      task-clock (msec)         #    1.992 CPUs utilized          
         2,057      context-switches          #    0.103 K/sec                  
             7      cpu-migrations            #    0.000 K/sec                  
         8,783      page-faults               #    0.440 K/sec                  
52,568,560,651      cycles                    #    2.632 GHz                    
89,424,109,426      instructions              #    1.70  insn per cycle         
17,496,929,762      branches                  #  875.904 M/sec                  
    97,910,697      branch-misses             #    0.56% of all branches        

  10.025825765 seconds time elapsed

我在 docker 图像上使用 perf 工具时得到的 output：

Performance counter stats for 'docker run -ti --rm polinux/stress-ng --cpu 2 --timeout 10':

    154.613610      task-clock (msec)         #    0.014 CPUs utilized          
           858      context-switches          #    0.006 M/sec                  
           113      cpu-migrations            #    0.731 K/sec                  
         4,989      page-faults               #    0.032 M/sec                  
   252,242,504      cycles                    #    1.631 GHz                    
   375,927,959      instructions              #    1.49  insn per cycle         
    84,847,109      branches                  #  548.769 M/sec                  
     1,127,634      branch-misses             #    1.33% of all branches        

  10.704752134 seconds time elapsed

有人可以帮助我在使用 docker 运行时如何获得压力指标吗？

Answer 1

继续@osgx 的评论，

这里提到，默认情况下， perf stat命令不仅会监控被监控进程的所有线程，还会监控其子进程和线程。

这种情况下的问题是，通过运行perf stat并监控docker run stress-ng命令，您没有监控实际的stress-ng过程。 需要注意的是，作为容器的一部分运行的进程实际上不会由docker客户端启动，而是由 docker docker-containerd-shim进程（它是dockerd进程的孙进程）启动。

如果你运行 docker 命令在容器内运行stress-ng并观察进程树，就很明显了。

docker run -ti --name=stress-ng --rm polinux/stress-ng --cpu 2 --timeout 100

ps -elf | grep docker

0 S ubuntu    26379 114001  0  80   0 - 119787 futex_ 12:33 pts/3   00:00:00 docker run -ti --name=stress-ng --rm polinux/stress-ng --cpu 2 --timeout 10000
4 S root      26431 118477  0  80   0 -  2227 -      12:33 ?        00:00:00 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/72a8c2787390669ff4eeae6f343ab4f9f60434f39aae66b1a778e78b7e5e45d8 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
0 S ubuntu    26610  26592  0  80   0 -  3236 pipe_w 12:34 pts/6    00:00:00 grep --color=auto docker
4 S root     118453      1  3  80   0 - 283916 -     May02 ?        01:01:57 /usr/bin/dockerd -H fd://
4 S root     118477 118453  4  80   0 - 457853 -     May02 ?        01:14:36 docker-containerd --config /var/run/docker/containerd/containerd.toml

----------------------------------------------------------------------

ps -elf | grep stress-ng

0 S ubuntu    26379 114001  0  80   0 - 119787 futex_ 12:33 pts/3   00:00:00 docker run -ti --name=stress-ng --rm polinux/stress-ng --cpu 2 --timeout 10000
4 S root      26455  26431  0  80   0 - 16621 -      12:33 pts/0    00:00:00 /usr/bin/stress-ng --cpu 2 --timeout 10000
1 R root      26517  26455 99  80   0 - 16781 -      12:33 pts/0    00:01:08 /usr/bin/stress-ng --cpu 2 --timeout 10000
1 R root      26518  26455 99  80   0 - 16781 -      12:33 pts/0    00:01:08 /usr/bin/stress-ng --cpu 2 --timeout 10000
0 S ubuntu    26645  26592  0  80   0 -  3236 pipe_w 12:35 pts/6    00:00:00 grep --color=auto stress-ng

第一个stress-ng进程的PPID是26431，不是docker run命令，而是docker docker-containerd-shim进程。 监控docker run命令永远不会反映正确的值，因为docker客户端完全脱离了启动stress-ng命令的过程。

解决此问题的一种方法是将perf stat命令附加到由 docker 运行时启动的压力 ng 进程的 PID。

例如，在上述情况下，一旦启动docker run命令，您就可以立即开始执行此操作 -

perf stat -p 26455,26517,26518

 Performance counter stats for process id '26455,26517,26518':

     148171.516145      task-clock (msec)         #    1.939 CPUs utilized          
                49      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
                67      page-faults               #    0.000 K/sec

您可以稍微增加--timeout以使命令运行更长时间，因为您现在正在启动perf stat post starting stress-ng 。 此外，您还必须考虑损失的初始测量时间的一小部分。

另一种方法是在 docker 容器内运行perf stat ，类似于docker run perf stat... ，但为此您必须开始为您的容器提供privileges ，因为默认情况下， perf_event_open系统调用被列入黑名单在docker中。 你可以在这里阅读这个答案。

如何在 docker 运行压力 ng 时使用 perf 工具？

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-05-03 12:56:14

如何在 docker 运行压力 ng 时使用 perf 工具？

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-05-03 12:56:14

解决方案1
2 已采纳 2020-05-03 12:56:14