如何在 docker 运行压力 ng 时使用 perf 工具？

Question

I am using stress-ng docker image from https://hub.docker.com/r/polinux/stress-ng/dockerfile to stress my system.我正在使用https://hub.docker.com/r/polinux/stress-ng/dockerfile中的 stress-ng docker 图像来强调我的系统。 I want to use perf tool to monitor metrics.我想使用 perf 工具来监控指标。

perf stat -- stress-ng --cpu 2 --timeout 10 runs stress-ng for 10 seconds and returns performance metrics. perf stat -- stress-ng --cpu 2 --timeout 10运行 stress-ng 10 秒并返回性能指标。 I tried to do the same with the docker image by using perf stat -- docker run -ti --rm polinux/stress-ng --cpu 2 --timeout 10 .我尝试使用perf stat -- docker run -ti --rm polinux/stress-ng --cpu 2 --timeout 10对 docker 映像执行相同的操作。 This returns metrics but not the metrics of stress-ng.这会返回指标，但不会返回压力-ng 的指标。

The output I got when using 'perf stat' on stress-ng: output 在stress-ng上使用“perf stat”时得到：

Performance counter stats for 'stress-ng --cpu 2 --timeout 10':

  19975.863889      task-clock (msec)         #    1.992 CPUs utilized          
         2,057      context-switches          #    0.103 K/sec                  
             7      cpu-migrations            #    0.000 K/sec                  
         8,783      page-faults               #    0.440 K/sec                  
52,568,560,651      cycles                    #    2.632 GHz                    
89,424,109,426      instructions              #    1.70  insn per cycle         
17,496,929,762      branches                  #  875.904 M/sec                  
    97,910,697      branch-misses             #    0.56% of all branches        

  10.025825765 seconds time elapsed

The output I got when using perf tool on docker image:我在 docker 图像上使用 perf 工具时得到的 output：

Performance counter stats for 'docker run -ti --rm polinux/stress-ng --cpu 2 --timeout 10':

    154.613610      task-clock (msec)         #    0.014 CPUs utilized          
           858      context-switches          #    0.006 M/sec                  
           113      cpu-migrations            #    0.731 K/sec                  
         4,989      page-faults               #    0.032 M/sec                  
   252,242,504      cycles                    #    1.631 GHz                    
   375,927,959      instructions              #    1.49  insn per cycle         
    84,847,109      branches                  #  548.769 M/sec                  
     1,127,634      branch-misses             #    1.33% of all branches        

  10.704752134 seconds time elapsed

Can someone please help me with how to get metrics of stress-ng when run using docker?有人可以帮助我在使用 docker 运行时如何获得压力指标吗？

Answer 1

Carrying on from comments by @osgx,继续@osgx 的评论，

As is mentioned here , by default, the perf stat command will monitor not only all the threads of the process to be monitored, but also its child processes and threads. 这里提到，默认情况下， perf stat命令不仅会监控被监控进程的所有线程，还会监控其子进程和线程。

The problem in this situation is that by running perf stat and monitoring the docker run stress-ng command, you are not monitoring the actual stress-ng process.这种情况下的问题是，通过运行perf stat并监控docker run stress-ng命令，您没有监控实际的stress-ng过程。 It is important to note that, the processes running as part of the container, will actually not be started by the docker client, but rather by the docker-containerd-shim process (which is a grandchild process of the dockerd process).需要注意的是，作为容器的一部分运行的进程实际上不会由docker客户端启动，而是由 docker docker-containerd-shim进程（它是dockerd进程的孙进程）启动。

If you run the docker command to run stress-ng inside the container and observe the process-tree, it becomes evident.如果你运行 docker 命令在容器内运行stress-ng并观察进程树，就很明显了。

docker run -ti --name=stress-ng --rm polinux/stress-ng --cpu 2 --timeout 100

ps -elf | grep docker

0 S ubuntu    26379 114001  0  80   0 - 119787 futex_ 12:33 pts/3   00:00:00 docker run -ti --name=stress-ng --rm polinux/stress-ng --cpu 2 --timeout 10000
4 S root      26431 118477  0  80   0 -  2227 -      12:33 ?        00:00:00 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/72a8c2787390669ff4eeae6f343ab4f9f60434f39aae66b1a778e78b7e5e45d8 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
0 S ubuntu    26610  26592  0  80   0 -  3236 pipe_w 12:34 pts/6    00:00:00 grep --color=auto docker
4 S root     118453      1  3  80   0 - 283916 -     May02 ?        01:01:57 /usr/bin/dockerd -H fd://
4 S root     118477 118453  4  80   0 - 457853 -     May02 ?        01:14:36 docker-containerd --config /var/run/docker/containerd/containerd.toml

----------------------------------------------------------------------

ps -elf | grep stress-ng

0 S ubuntu    26379 114001  0  80   0 - 119787 futex_ 12:33 pts/3   00:00:00 docker run -ti --name=stress-ng --rm polinux/stress-ng --cpu 2 --timeout 10000
4 S root      26455  26431  0  80   0 - 16621 -      12:33 pts/0    00:00:00 /usr/bin/stress-ng --cpu 2 --timeout 10000
1 R root      26517  26455 99  80   0 - 16781 -      12:33 pts/0    00:01:08 /usr/bin/stress-ng --cpu 2 --timeout 10000
1 R root      26518  26455 99  80   0 - 16781 -      12:33 pts/0    00:01:08 /usr/bin/stress-ng --cpu 2 --timeout 10000
0 S ubuntu    26645  26592  0  80   0 -  3236 pipe_w 12:35 pts/6    00:00:00 grep --color=auto stress-ng

The PPID of the first stress-ng process is 26431, which is not the docker run command, but actually the docker-containerd-shim process.第一个stress-ng进程的PPID是26431，不是docker run命令，而是docker docker-containerd-shim进程。 Monitoring the docker run command will never reflect correct values, because the docker client is completely detached from the process of starting the stress-ng commands.监控docker run命令永远不会反映正确的值，因为docker客户端完全脱离了启动stress-ng命令的过程。

One way to get around this problem would be to attach the perf stat command to the PIDs of the stress-ng processes that are started by the docker runtime.解决此问题的一种方法是将perf stat命令附加到由 docker 运行时启动的压力 ng 进程的 PID。

eg, as in the above case, once the docker run command is started, you can immediately start doing this -例如，在上述情况下，一旦启动docker run命令，您就可以立即开始执行此操作 -

perf stat -p 26455,26517,26518

 Performance counter stats for process id '26455,26517,26518':

     148171.516145      task-clock (msec)         #    1.939 CPUs utilized          
                49      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
                67      page-faults               #    0.000 K/sec

You may increase the --timeout a little bit so that the command runs longer, since you are now starting perf stat post starting stress-ng .您可以稍微增加--timeout以使命令运行更长时间，因为您现在正在启动perf stat post starting stress-ng 。 Also you have to account for a small fraction of the initial measuring time lost.此外，您还必须考虑损失的初始测量时间的一小部分。

The other way would be to run perf stat inside the docker container, something like a docker run perf stat... , but for that you would have to start providing privileges to your container, since, by default, the perf_event_open system call is blacklisted in docker .另一种方法是在 docker 容器内运行perf stat ，类似于docker run perf stat... ，但为此您必须开始为您的容器提供privileges ，因为默认情况下， perf_event_open系统调用被列入黑名单在docker中。 You can read this answer here .你可以在这里阅读这个答案。

如何在 docker 运行压力 ng 时使用 perf 工具？

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-05-03 12:56:14

如何在 docker 运行压力 ng 时使用 perf 工具？

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-05-03 12:56:14

解决方案1
2 已采纳 2020-05-03 12:56:14