简体   繁体   English

“内存使用”指标:Go tool pprof vs docker stats

[英]"Memory used" metric: Go tool pprof vs docker stats

I wrote a golang application running in each of my docker containers.我编写了一个在我的每个 docker 容器中运行的 golang 应用程序。 It communicates with each other using protobufs via tcp and udp and I use Hashicorp's memberlist library to discover each of the containers in my network.它使用 protobufs 通过 tcp 和 udp 相互通信,我使用 Hashicorp 的成员列表库来发现我网络中的每个容器。 On docker stats I see that the memory usage is linearly increasing so I am trying to find any leaks in my application.在 docker stats 上,我看到内存使用量呈线性增加,因此我试图在我的应用程序中找到任何泄漏。

Since it is an application which keeps running, am using http pprof to check the live application in any one of the containers.由于它是一个持续运行的应用程序,我使用 http pprof 检查任何一个容器中的实时应用程序。 I see that runtime.MemStats.sys is constant even though docker stats is linearly increasing.我看到 runtime.MemStats.sys 是恒定的,即使 docker stats 线性增加。 My --inuse_space is around 1MB and --alloc_space ofcourse keeps increasing over time.我的 --inuse_space 大约为 1MB 并且 --alloc_space 当然会随着时间的推移不断增加。 Here is a sample of alloc_space:这是 alloc_space 的示例:

root@n3:/app# go tool pprof --alloc_space main http://localhost:8080/debug/pprof/heap                                                                                                                       
Fetching profile from http://localhost:8080/debug/pprof/heap
Saved profile in /root/pprof/pprof.main.localhost:8080.alloc_objects.alloc_space.005.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top --cum
1024.11kB of 10298.19kB total ( 9.94%)
Dropped 8 nodes (cum <= 51.49kB)
Showing top 10 nodes out of 34 (cum >= 1536.07kB)
      flat  flat%   sum%        cum   cum%
         0     0%     0% 10298.19kB   100%  runtime.goexit
         0     0%     0%  6144.48kB 59.67%  main.Listener
         0     0%     0%  3072.20kB 29.83%  github.com/golang/protobuf/proto.Unmarshal
  512.10kB  4.97%  4.97%  3072.20kB 29.83%  github.com/golang/protobuf/proto.UnmarshalMerge
         0     0%  4.97%  2560.17kB 24.86%  github.com/hashicorp/memberlist.(*Memberlist).triggerFunc
         0     0%  4.97%  2560.10kB 24.86%  github.com/golang/protobuf/proto.(*Buffer).Unmarshal
         0     0%  4.97%  2560.10kB 24.86%  github.com/golang/protobuf/proto.(*Buffer).dec_struct_message
         0     0%  4.97%  2560.10kB 24.86%  github.com/golang/protobuf/proto.(*Buffer).unmarshalType
  512.01kB  4.97%  9.94%  2048.23kB 19.89%  main.SaveAsFile
         0     0%  9.94%  1536.07kB 14.92%  reflect.New
(pprof) list main.Listener
Total: 10.06MB
ROUTINE ======================== main.Listener in /app/listener.go
         0        6MB (flat, cum) 59.67% of Total
         .          .     24:   l.SetReadBuffer(MaxDatagramSize)
         .          .     25:   defer l.Close()
         .          .     26:   m := new(NewMsg)
         .          .     27:   b := make([]byte, MaxDatagramSize)
         .          .     28:   for {
         .   512.02kB     29:       n, src, err := l.ReadFromUDP(b)
         .          .     30:       if err != nil {
         .          .     31:           log.Fatal("ReadFromUDP failed:", err)
         .          .     32:       }
         .   512.02kB     33:       log.Println(n, "bytes read from", src)
         .          .     34:       //TODO remove later. For testing Fetcher only
         .          .     35:       if rand.Intn(100) < MCastDropPercent {
         .          .     36:           continue
         .          .     37:       }
         .        3MB     38:       err = proto.Unmarshal(b[:n], m)
         .          .     39:       if err != nil {
         .          .     40:           log.Fatal("protobuf Unmarshal failed", err)
         .          .     41:       }
         .          .     42:       id := m.GetHead().GetMsgId()
         .          .     43:       log.Println("CONFIG-UPDATE-RECEIVED { \"update_id\" =", id, "}")
         .          .     44:       //TODO check whether value already exists in store?
         .          .     45:       store.Add(id)
         .        2MB     46:       SaveAsFile(id, b[:n], StoreDir)
         .          .     47:       m.Reset()
         .          .     48:   }
         .          .     49:}
(pprof) 

I have been able to verify that no goroutine leak is happening using http://:8080/debug/pprof/goroutine?debug=1我已经能够使用 http://:8080/debug/pprof/goroutine?debug=1 验证没有发生 goroutine 泄漏

Please comment on why docker stats shows a different picture (linearly increasing memory)请评论为什么 docker stats 显示不同的图片(线性增加内存)

CONTAINER           CPU %               MEM USAGE / LIMIT       MEM %               NET I/O               BLOCK I/O           PIDS
n3                  0.13%               19.73 MiB / 31.36 GiB   0.06%               595 kB / 806 B        0 B / 73.73 kB      14

If I run it over night, this memory bloats to around 250MB.如果我通宵运行它,这个内存膨胀到大约 250MB。 I have not run it longer than that, but I feel this should have reached a plateau instead of increasing linearly我没有比这更长的时间运行它,但我觉得这应该达到稳定状态而不是线性增加

docker stats shows the memory usage stats from cgroups. docker stats 显示来自 cgroups 的内存使用统计信息。 (Refer: https://docs.docker.com/engine/admin/runmetrics/ ) (参考: https : //docs.docker.com/engine/admin/runmetrics/

If you read the "outdated but useful" documentation ( https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt ) it says如果您阅读“过时但有用”的文档( https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt ),它会说

5.5 usage_in_bytes 5.5 usage_in_bytes

For efficiency, as other kernel components, memory cgroup uses some optimization to avoid unnecessary cacheline false sharing.为了效率,与其他内核组件一样,内存 cgroup 使用了一些优化来避免不必要的缓存行错误共享。 usage_in_bytes is affected by the method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz value for efficient access. usage_in_bytes 受该方法的影响并且不显示内存(和交换)使用的“确切”值,它是有效访问的模糊值。 (Of course, when necessary, it's synchronized.) If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat(see 5.2). (当然,在必要的时候,它是同步的。)如果你想知道更准确的内存使用情况,你应该使用 memory.stat 中的 RSS+CACHE(+SWAP) 值(见 5.2)。

Page Cache and RES are included in the memory usage_in_bytes number. Page Cache 和 RES 包含在 memory usage_in_bytes 数中。 So if the container has File I/O, the memory usage stat will increase.所以如果容器有文件 I/O,内存使用统计会增加。 However, for a container, if the usage hits that maximum limit, it reclaims some of the memory which is unused.但是,对于容器,如果使用量达到最大限制,它会回收一些未使用的内存。 Hence, when I added a memory limit to my container, I could observe that the memory is reclaimed and used when the limit is hit.因此,当我向容器添加内存限制时,我可以观察到内存在达到限制时被回收和使用。 The container processes are not killed unless there is no memory to reclaim and a OOM error happens.除非没有要回收的内存并且发生 OOM 错误,否则容器进程不会被终止。 For anyone concerned with the numbers shown in docker stats, the easy way is to check the detailed stats available in cgroups at the path: /sys/fs/cgroup/memory/docker// This shows all the memory metrics in detail in memory.stats or other memory.* files.对于任何关心 docker stats 中显示的数字的人来说,简单的方法是在路径中检查 cgroups 中可用的详细统计信息:/sys/fs/cgroup/memory/docker// 这会详细显示内存中的所有内存指标。 stats 或其他 memory.* 文件。

If you want to limit the resources used by the docker container in the "docker run" command you can do so by following this reference: https://docs.docker.com/engine/admin/resource_constraints/如果要在“docker run”命令中限制 docker 容器使用的资源,可以按照以下参考进行操作: https : //docs.docker.com/engine/admin/resource_constraints/

Since I am using docker-compose, I did it by adding a line in my docker-compose.yml file under the service I wanted to limit:由于我使用的是 docker-compose,我通过在我想要限制的服务下的 docker-compose.yml 文件中添加一行来实现:

mem_limit: 32m内存限制:32m

where m stands for megabytes.其中 m 代表兆字节。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM