简体   繁体   English

如何为资源配额 k8s 创建警报以发送警报,例如 Prometheus

[英]How create a alert to the resource quotas k8s to send alert for example Prometheus

How to create an alert for k8s resource quotas to send alert eg Prometheus如何为 k8s 资源配额创建警报以发送警报,例如普罗米修斯

I have a problem, I can't find something like that, but I need to catch the resource quota error and send an alert to prometheus.我有一个问题,我找不到类似的东西,但我需要捕获资源配额错误并向普罗米修斯发送警报。

Has anyone seen this and managed to fix it?有没有人看到这个并设法解决它?

For example look this error:例如看这个错误:

Error from server (Forbidden): error when creating "https://k8s.io/examples/admin/resource/quota-mem-cpu-pod-2.yaml": pods "quota-mem-cpu-demo-2" is forbidden: exceeded quota: mem-cpu-demo, requested: requests.memory=700Mi, used: requests.memory=600Mi, limited: requests.memory=1Gi

How can I send this error by alerting on Prometheus?如何通过在 Prometheus 上发出警报来发送此错误?

one option - if you run kube-state-metrics, there is a handy kube_resourcequota metric that has labels一个选项——如果你运行 kube-state-metrics,有一个方便的 kube_resourcequota 指标有标签

resource= (eg: limits.cpu) type= (hard/used)资源=(例如:limits.cpu)类型=(硬/已用)

You could play around with alerting on when a NS goes over X% of quota on that resource.您可以尝试在 NS 超过该资源配额的 X% 时发出警报。 This would fire when a namespace exceeds 90% of a resourceQuota -> would fire for limits.cpu, limits.memory etc.当命名空间超过 resourceQuota 的 90% 时会触发 -> 会触发 limits.cpu、limits.memory 等。

If you administer a cluster with many teams and respective namespaces this could alert you to keep an eye on any run-away resources (eg mis-configured cronjobs chewing up cpu) and alert the team.如果您管理一个有许多团队和各自命名空间的集群,这可能会提醒您注意任何失控的资源(例如配置错误的 cronjobs 占用 cpu)并提醒团队。

100 * kube_resourcequota{type="used"} / ignoring(instance, job, type) (kube_resourcequota{type="hard"} > 0) > 90

kube-state-metrics: kube 状态指标:

https://github.com/kube.netes/kube-state-metrics/blob/master/docs/resourcequota-metrics.md https://github.com/kube.netes/kube-state-metrics/blob/master/docs/resourcequota-metrics.md

I lifted the above expression from here: https://sysdig.com/blog/alerting-kube.netes/我从这里提取了上面的表达式: https://sysdig.com/blog/alerting-kube.netes/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM