使用 prometheus 監控 pod 終止時間

Question

我正在尋找一個 Prometheus 指標，它可以讓我監控 pod 在消失之前在terminating state 中花費的時間。

我嘗試過使用kube_pod_container_status_terminated但它似乎只在它們完成終止過程后才注冊 pod，但不要幫助我了解終止 pod 需要多長時間。
我還查看了不久前在這個頻道中發現的kube_pod_status_phase ，但它似乎也缺乏這種洞察力。

我目前正在使用 cAdvisor、kube-state-metrics 和 prometheus node-exporter 收集關於我的 k8s 工作負載的指標，但如果它們包含所需的數據，我很樂意考慮其他收集器。
非普羅米修斯的解決方案也很棒。
有任何想法嗎？ 謝謝！

Answer 1

Kubernetes 本身、Heapster 和 metrics-server 不提供此類指標，但您可以通過安裝kube-state-metrics來獲得與您提到的指標接近的指標。 它有幾個反映 pod 狀態的 pod 指標：

kube_pod_status_phase
kube_pod_container_status_terminated
kube_pod_container_status_terminated_reason
kube_pod_container_status_last_terminated_reason

您可以在文檔中找到由kube-state-metrics提供的 pods 指標的完整列表。

還有Bitnami Helm chart可以簡化kube-state-metrics的安裝。

Answer 2

根據pod-metrics文檔：

對於某些情況，例如“終止”和“未知”，獲取 Pod 狀態並不簡單，因為它沒有存儲在 Pod.Status 中的字段后面。

因此，要模仿kubectl命令行使用的邏輯，您需要組合多個指標。 [...]

對於終止 state 中的 Pod： count(kube_pod_deletion_timestamp) by (namespace, pod) * count(kube_pod_status_reason{reason="NodeLost"} == 0) by (namespace, pod)

這是一個 Prometheus 規則的示例，可用於在Terminated state 中超過5m的 Pod 上發出警報。

groups:
- name: Pod state
  rules:
  - alert: PodsBlockInTerminatingState
    expr: count(kube_pod_deletion_timestamp) by (namespace, pod) * count(kube_pod_status_reason{reason="NodeLost"} == 0) by (namespace, pod) > 0
    for: 5m
    labels:
      severity: page
    annotations:
      summary: Pod {{$labels.namespace}}/{{$labels.pod}} block in Terminating state.

使用 prometheus 監控 pod 終止時間

問題描述

2 個解決方案

解決方案1
1 2018-12-17 15:03:06

解決方案2
0 2022-08-03 01:06:47

使用 prometheus 監控 pod 終止時間

問題描述

2 個解決方案

解決方案1 1 2018-12-17 15:03:06

解決方案2 0 2022-08-03 01:06:47

解決方案1
1 2018-12-17 15:03:06

解決方案2
0 2022-08-03 01:06:47