简体   繁体   中英

How to run pod based on Prometheus alert

Is there any way we can run pod based on the alert fired from Prometheus? We have a scenario where we need to execute a pod based on the disk pressure threshold. I am able to create alert but I need to execute a pod. How can I achieve that?

groups:
  - name: node_memory_MemAvailable_percent
    rules:
    - alert: node_memory_MemAvailable_percent_alert
      annotations:
        description: Memory on node {{ $labels.instance }} currently at {{ $value }}% 
          is under pressure
        summary: Memory usage is under pressure, system may become unstable.
      expr: |
        100 - ((node_memory_MemAvailable_bytes{job="node-exporter"} * 100) / node_memory_MemTotal_bytes{job="node-exporter"}) > 80
      for: 2m
      labels:
        severity: warning

I think the Alertmanager can help you, using the webhook receiver ( documentation ).

In this way, when the alert is triggered, Prometheus sends it to the Alertmanager, then the Alertmanager does a POST to a custom webhook.

Of course, you need to implement a service that handles the alert and runs your action.

Generally, your question shows disk pressure, and in the code I can see the amount of memory available. If you want to scale your replicas up and down based on your memory you can implement Horizontal Pod Autoscaler :

The Horizontal Pod Autoscaler is implemented as a control loop, with a period controlled by the controller manager's --horizontal-pod-autoscaler-sync-period flag (with a default value of 15 seconds).

During each period, the controller manager queries the resource utilization against the metrics specified in each HorizontalPodAutoscaler definition. The controller manager obtains the metrics from either the resource metrics API (for per-pod resource metrics), or the custom metrics API (for all other metrics).

You can create your own HPA based on memory utilization . Here is the example:

apiVersion: autoscaling/v2beta2 
kind: HorizontalPodAutoscaler
metadata:
  name: php-memory-scale 
spec:
  scaleTargetRef:
    apiVersion: apps/v1 
    kind: Deployment 
    name: php-apache 
  minReplicas: 1 
  maxReplicas: 10 
  metrics: 
  - type: Resource
    resource:
      name: memory 
      target:
        type: Utilization 
        averageValue: 10Mi 

You can also create your custom Kubernetes HPA with custom metrics from Prometheus :

Autoscaling is an approach to automatically scale up or down workloads based on the resource usage. The K8s Horizontal Pod Autoscaler :

  • is implemented as a control loop that periodically queries the Resource Metrics API for core metrics, through metrics.k8s.io API, like CPU/memory and the Custom Metrics API for application-specific metrics (external.metrics.k8s.io or custom.metrics.k8s.io API. They are provided by “adapter” API servers offered by metrics solution vendors. There are some known solutions , but none of those implementations are officially part of Kubernetes)
  • automatically scales the number of pods in a deployment or replica set based on the observed metrics.

In what follows we'll focus on the custom metrics because the Custom Metrics API made it possible for monitoring systems like Prometheus to expose application-specific metrics to the HPA controller.

Another solution might be to use KEDA . Look at this guide . Here is example yaml for monitoring 500 errors from nginx:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
 name: nginx-scale
 namespace: keda-hpa
spec:
 scaleTargetRef:
   kind: Deployment
   name: nginx-server
 minReplicaCount: 1
 maxReplicaCount: 5
 cooldownPeriod: 30
 pollingInterval: 1
 triggers:
 - type: prometheus
   metadata:
     serverAddress: https://prometheus_server/prometheus
     metricName: nginx_connections_waiting_keda
     query: |
       sum(nginx_connections_waiting{job="nginx"})
     threshold: "500"

是的,我们有 webhook,但是我们通过使用 am executor 作为来自 am executor 自定义脚本的自定义服务实现了服务,我们已经从 ado 管道运行了所需的作业

You can do this with an open source project called Robusta . (Disclaimer: I'm the maintainer.)

First, define which Prometheus alert you want to trigger on:

customPlaybooks:
- triggers:
  - on_prometheus_alert:
      alert_name: DiskSpaceAlertName
  actions:
  - disk_watcher: {}

Second, we need to write the actual action that runs when triggered. (Called disk_watcher above.) You can skip this step if someone already wrote an action for your need, as there are 50+ builtin actions already.

In this case, there is no built-in action so we need to write one in Python. (I would be happy to add a builtin one though :)

@action
def disk_watcher(event: DeploymentEvent):
    deployment = event.get_deployment()

    # read / modify the resources here
    print(deployment.spec.template.spec.containers[0].resources)
    deployment.update()

    # fetch the relevant pod
    pod = RobustaPod.find_pod(deployment.metadata.name, deployment.metadata.namespace)

    # see what is using up disk space
    output = pod.exec("df -h")

    # create another pod
    other_output = RobustaPod.exec_in_debugger_pod("my-new-pod", pod.spec.nodeName, "cmd to run", "my-image")

    # send details to slack or any other destination
    event.add_enrichment([
        MarkdownBlock(f"the output from df is attached"),
        FileBlock("df.txt", output.encode()),
        FileBlock("other.txt", other_output.encode())
    ])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM