简体   繁体   English

在 Google Cloud 中监控 Kubernetes Pod

[英]Monitoring Kubernetes Pods in Google Cloud

We have an application deployed on GKE with a total of 10 pods running and serving the application.我们在 GKE 上部署了一个应用程序,共有 10 个 Pod 运行并为该应用程序提供服务。 I am trying to find the metrics using which I can create an alert when my Pod goes down or is there a way to check the status of Pods so that I can set up an alert based on that condition?我正在尝试找到可以在我的 Pod 出现故障时创建警报的指标,或者有没有办法检查 Pod 的状态以便我可以根据该条件设置警报?

I explored GCP and looked into their documentation but couldn't find anything.我探索了 GCP 并查看了他们的文档,但找不到任何东西。 What I could find is one metric below but I don't know what it measures.我能找到的是下面的一个指标,但我不知道它衡量的是什么。 To me it looks like a number of times Kubernetes thinks a pod has died and it restarts the pod.在我看来,Kubernetes 多次认为 pod 已经死亡并重新启动了 pod。

Metric: kubernetes.io/container/restart_count
Resource type: k8s_container

Any advice on this is highly appreciated as we can improve our monitoring based on this metric对此的任何建议都非常感谢,因为我们可以根据该指标改进我们的监控

That metric is the same you are right it will the count of POD restart.该指标与您是对的相同,它将计算 POD 重新启动的计数。

Number of times the container has restarted.容器重新启动的次数。 Sampled every 60 seconds.每 60 秒采样一次。 After sampling, data is not visible for up to 120 seconds.采样后,最长 120 秒内数据不可见。

Read more at: https://cloud.google.com/monitoring/api/metrics_kubernetes阅读更多: https://cloud.google.com/monitoring/api/metrics_kubernetes

Or或者

You can use Prometheus to get the metrics and monitor with Grafana您可以使用 Prometheus 获取指标并使用Grafana进行监控

sum(kube_pod_container_status_restarts_total{cluster="$cluster",namespace="$namespace",pod=~"$service.*"})

This will give the value of the POD restart count.这将给出 POD 重新启动计数的值。

OR或者

You can also use the BotKube : https://www.botkube.io/installation/您也可以使用BotKubehttps://www.botkube.io/installation/

You can set to notify when your readiness liveness fails to slack notification etc..您可以设置在您的准备活跃度未能松弛通知等时通知。

Or或者

You write your own script and run it on Kubernetes to monitor and notify when any POD restart in cluster.您编写自己的脚本并在 Kubernetes 上运行它以监控和通知集群中任何 POD 何时重新启动。

Example github: https://github.com/harsh4870/Slack-Post-On-POD-Ready-State示例 github: https://github.com/harsh4870/Slack-Post-On-POD-Ready-State

This script notifies in slack when POD becomes ready after deployment, you can change it to monitor the restart count.当 POD 在部署后准备就绪时,此脚本会在 slack 中通知,您可以更改它以监控重新启动计数。

i would recommend using Prometheus, Grafana option, however, stackdriver is Good but i am not Google employee.我建议使用Prometheus,Grafana选项,但是, stackdriver很好,但我不是 Google 员工。

Why do you want to monitor when a pod is down?为什么要监视 pod 何时关闭? Kubernetes will immediatly try to start it on the same node or on a different one if that node is down for whatever reason. Kubernetes 将立即尝试在同一节点上启动它,或者如果该节点由于某种原因而关闭,则在另一个节点上启动它。

Instead, there are other metrics you have to monitor for.相反,您必须监控其他指标。 Like the restart_count which could indicate that pods are not coming back online.就像 restart_count 一样,它可能表明 pod 没有重新上线。 But also other metrics like但还有其他指标,如

  • kube_pod_container_status_restarts_total kube_pod_container_status_restarts_total
  • kube_pod_status_phase kube_pod_status_phase
  • kube_pod_container_status_running kube_pod_container_status_running
  • kube_pod_status_phase vs kube_node_status_capacity_pods kube_pod_status_phase 与 kube_node_status_capacity_pods

This article has a lot of interesting metrics to monitor for https://medium.com/google-cloud/gke-monitoring-84170ea44833本文有很多有趣的指标可以监控https://medium.com/google-cloud/gke-monitoring-84170ea44833

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM