简体   繁体   中英

How to cause planned failures of a hosted k8s application

We currently have a hybrid pipeline airflow setup. The core of airflow is being hosted on k8s with replicants.

I want to simulate node failures to ensure the replication and fault tolerance is working as expected for airflow and the dags being run. It's critical that our pipeline fails gracefully and I would like structured tests for it.

Besides manually turning off nodes for my cluster, how can I systematically simulate failures and track their impact on production?

I think you can simulate node failures by sending signals to your pods. In this link , you find a list of linux signal, but you don't need to test all of them:

  • SIGPWR : Power failure (System V)
  • SIGKILL : Kill signal, K8S sends this signall after the the grace period
  • SIGTERM : Termination signal, K8S send a this signal to the containers in the pod to lets them know that they are going to be shut down soon (after the grace period)

To send the signal, you can use this command:

kubectl exec <pod nme> -c <container name> -- /sbin/killall5 -<signal code>

The signal codes are:

  • 30 for SIGPWR
  • 9 for SIGKILL
  • 15 for SIGTERM

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM