简体   繁体   中英

How do I measure percent CPU usage using prometheus?

I'm trying to use the prometheus measurements to get percent CPU usage for each micro service running in Kubernetes to optimize CPU resources and limits.

I have a setup where for each customer there are 4 micro services running on the server. Each micro service has a separate memory resource and limit and separate CPU resource and limit. To get the average from prometheus I am using the following query:

avg_over_time(sum(rate(container_cpu_usage_seconds_total{name=~"^k8s_. ", namespace=~"$namespace", container_name,="POD". pod=~"^$Deployment. $"}[5m]))[24h:5m]) / avg_over_time(sum(container_spec_cpu_quota{name=~"^k8s_. ", namespace=~"$namespace",container_name,="POD". pod=~"^$Deployment. $"}/container_spec_cpu_period{name=~"^k8s_. ",namespace=~"$namespace", container_name,="POD". pod=~"^$Deployment. $"})[24h:5m]) * 100

To check that the value above is correct, I go into each Kubernetes pod and check the CPU usage using the command: kubectl -n {namespace} top pod {Deployment}

To check the CPU limit I use the command: kubectl -n {namespace} describe pod {Deployment}

Where I get the CPU limit.

Then I do the calculation: CPU usage divided by CPU limit times 100 equals current percent of CPU usage.

The values I get from the CPU usage and limit in Kubernetes are different from the values I get using the prometheus query (Some of the values I get are close and some are quite off). Here is an example of CPU usage in Percent from Prometheus and from Kubernetes:

Customer Service Prometheus Kubernetes
Customer A Service 1 0.216 0.2
Service 2 0.137 0.2
Service 3 0.445 0.45
Service 4 0.165 0.2
Customer B Service 1 0.139 0.2
Service 2 0.0917 0.2
Service 3 0.5739 0.5
Service 4 0.0972 0.2

Anyone have any comments whether I am doing the measurements correctly? Is there a mistake in my prometheus query or how I get the values from Kubernetes? I want to make sure that I am measuring the percent CPU usage correctly using prometheus

Can you try the following query for one service and modify the query according your requirement:

sum (rate (container_cpu_usage_seconds_total{id="/"}[1m])) / sum (machine_cpu_cores) * 100

I also track the CPU usage for each pod.

sum (rate (container_cpu_usage_seconds_total{image,=""}[1m])) by (pod_name) I have a complete kubernetes-prometheus solution on GitHub: maybe can help you with more metrics: https://github.com/camilb/prometheus-kubernetes.

I hope this will help. The result is pretty much the same as the Windows performance manager, So, for CPU % for running services (tasks: processes):

sum by (process,hostname)(irate(wmi_process_cpu_time_total{scaleset="name", process=~

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM