简体   繁体   中英

Receive alert messages from kubernetes in a pod when another pod dies

Scenario:

we have a K8s (AKS) cluster with a Deployment including 3 replicas running a .NET container. The replicas use Redis lock to access a shared resource. If one replica dies while holding the lock, the lock is not released, so the other replicas need to wait until the lock expires before being able to proceed. To reduce this delay, we could decrease the lock expiration timeout, but if the lock expires too early, while a holder is still working on the shared resource, we get a race condition. Other solutions are possible (like renewing a lock lease etc.) non seems enough robust.

Is there a way to receive an event from the k8s engine when a pod dies?

The remaining live replicas would then remove the lock and go on with their work. No expiration issues (the lock wold be created persistent, without expitation)

Simplest solution is probably to add a lifecycle.preStop step, where the lock file is freed or the other containers are signaled. Something like this:

[...]
spec:
  containers:
  - name: lifecycle-demo-container
    image: nginx
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh","-c","echo free > /tmp/lockfile"]

or command: ["/bin/sh","-c","curl -X POST -F 'status=free' http://<service-name>:8080/lock"]'

From docs :

Kube.netes sends the preStop event immediately before the Container is terminated

Edit

Regarding crashing pods there is no tool for that, especially not to receive .NET specifics.

You can observe container crashes, they are published as events, run kubectl get events . So there is a different approach:

You can use the Kube.netes C# Client to watch for those events. Cleanest solution would be to have an second application running in the same namespace that parses events and signals the other applications that one died.

Another solution that comes to mind is having a sidecar container in the pod, that periodically checks health of the main application. If that is dead it can send a signal to the other pods.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM