How to cause planned failures of a hosted k8s application

Question

We currently have a hybrid pipeline airflow setup. The core of airflow is being hosted on k8s with replicants.

I want to simulate node failures to ensure the replication and fault tolerance is working as expected for airflow and the dags being run. It's critical that our pipeline fails gracefully and I would like structured tests for it.

Besides manually turning off nodes for my cluster, how can I systematically simulate failures and track their impact on production?

Answer 1

I think you can simulate node failures by sending signals to your pods. In this link , you find a list of linux signal, but you don't need to test all of them:

SIGPWR : Power failure (System V)
SIGKILL : Kill signal, K8S sends this signall after the the grace period
SIGTERM : Termination signal, K8S send a this signal to the containers in the pod to lets them know that they are going to be shut down soon (after the grace period)

To send the signal, you can use this command:

kubectl exec <pod nme> -c <container name> -- /sbin/killall5 -<signal code>

The signal codes are:

30 for SIGPWR
9 for SIGKILL
15 for SIGTERM

How to cause planned failures of a hosted k8s application

Question

1 answers

solution1
0 2022-08-24 20:16:06

How to cause planned failures of a hosted k8s application

Question

1 answers

solution1 0 2022-08-24 20:16:06

solution1
0 2022-08-24 20:16:06