容器上的“container_memory_working_set_bytes”和“container_memory_rss”指标有什么区别

Question

I need to monitor my container memory usage running on kube.netes cluster.我需要监控在 kube.netes 集群上运行的容器 memory 的使用情况。 After read some articles there're two recommendations: "container_memory_rss", "container_memory_working_set_bytes"阅读一些文章后，有两个建议：“container_memory_rss”、“container_memory_working_set_bytes”

The definitions of both metrics are said (from the cAdvisor code)说了两个指标的定义（来自cAdvisor代码）

"container_memory_rss": The amount of anonymous and swap cache memory “container_memory_rss”：匿名和交换缓存的数量 memory
"container_memory_working_set_bytes": The amount of working set memory, this includes recently accessed memory, dirty memory, and kernel memory “container_memory_working_set_bytes”：工作集的数量 memory，这包括最近访问的 memory、脏的 memory 和 kernel memory

I think both metrics are represent the bytes size on the physical memory that process uses.我认为这两个指标都代表进程使用的物理 memory 上的字节大小。 But there are some differences between the two values from my grafana dashboard.但是我的 grafana 仪表板中的两个值之间存在一些差异。

My question is:我的问题是：

What is the difference between two metrics?两个指标有什么区别？
Which metrics are much proper to monitor memory usage?哪些指标更适合监控 memory 的使用情况？ Some post said both because one of those metrics reaches to the limit, then that container is oom killed.一些帖子说，因为其中一个指标达到了极限，所以那个容器被 oom 杀死了。

Answer 1

You are right.你是对的。 I will try to address your questions in more detail.我将尝试更详细地解决您的问题。

What is the difference between two metrics?两个指标有什么区别？

container_memory_rss equals to the value of total_rss from /sys/fs/cgroups/memory/memory.status file: container_memory_rss等于/sys/fs/cgroups/memory/memory.status文件中total_rss的值：

// The amount of anonymous and swap cache memory (includes transparent
// hugepages).
// Units: Bytes.
RSS uint64 `json:"rss"`

The total amount of anonymous and swap cache memory (it includes transparent hugepages), and it equals to the value of total_rss from memory.status file.匿名缓存和交换缓存的总量为 memory（包括透明大页面），等于memory.status文件中total_rss的值。 This should not be confused with the true resident set size or the amount of physical memory used by the cgroup.这不应与真正的resident set size或 cgroup 使用的物理 memory 的数量相混淆。 rss + file_mapped will give you the resident set size of cgroup. rss + file_mapped将为您提供 cgroup 的驻留集大小。 It does not include memory that is swapped out.它不包括换出的 memory。 It does include memory from shared libraries as long as the pages from those libraries are actually in memory. It does include all stack and heap memory.它确实包括来自共享库的 memory，只要来自这些库的页面实际上在 memory 中。它确实包括所有堆栈和堆 memory。

container_memory_working_set_bytes (as already mentioned by Olesya) is the total usage - inactive file . container_memory_working_set_bytes （正如 Olesya 已经提到的）是total usage - inactive file 。 It is an estimate of how much memory cannot be evicted:估计有多少 memory 不能被驱逐：

// The amount of working set memory, this includes recently accessed memory,
// dirty memory, and kernel memory. Working set is <= "usage".
// Units: Bytes.
WorkingSet uint64 `json:"working_set"`

Working Set is the current size, in bytes, of the Working Set of this process. Working Set 是此进程的工作集的当前大小（以字节为单位）。 The Working Set is the set of memory pages touched recently by the threads in the process.工作集是进程中线程最近接触的 memory 页的集合。

Which metrics are much proper to monitor memory usage?哪些指标更适合监控 memory 的使用情况？ Some post said both because one of those metrics reaches to the limit, then that container is oom killed.一些帖子说，因为其中一个指标达到了极限，所以那个容器被 oom 杀死了。

If you are limiting the resource usage for your pods than you should monitor both as they will cause an oom-kill if they reach a particular resource limit.如果您正在限制 pod 的资源使用，那么您应该监控两者，因为如果它们达到特定的资源限制，它们将导致 oom-kill。

I also recommend this article which shows an example explaining the below assertion:我还推荐这篇文章，其中显示了一个解释以下断言的示例：

You might think that memory utilization is easily tracked with container_memory_usage_bytes , however, this metric also includes cached (think filesystem cache) items that can be evicted under memory pressure.您可能认为使用container_memory_usage_bytes可以轻松跟踪 memory 利用率，但是，该指标还包括可以在 memory 压力下被逐出的缓存（想想文件系统缓存）项目。 The better metric is container_memory_working_set_bytes as this is what the OOM killer is watching for.更好的指标是container_memory_working_set_bytes ，因为这是 OOM 杀手所关注的。

EDIT:编辑：

Adding some additional sources as a supplement:添加一些额外的来源作为补充：

容器上的“container_memory_working_set_bytes”和“container_memory_rss”指标有什么区别

问题描述

1 个解决方案

解决方案1
38 已采纳 2021-03-24 10:10:08

容器上的“container_memory_working_set_bytes”和“container_memory_rss”指标有什么区别

问题描述

1 个解决方案

解决方案1 38 已采纳 2021-03-24 10:10:08

解决方案1
38 已采纳 2021-03-24 10:10:08