[英]How to simulate Power Failure In Kubernetes
I have my rook-ceph
cluster running on AWS
. 我的
rook-ceph
集群在AWS
运行。 Its loaded up with data. 它加载了数据。 Is there's any way to stimulate POWER FAILURE so that I can test the behaviour of my cluster?.
有什么方法可以激发电源故障,以便我可以测试集群的行为?
From Docker you can send KILL signal "SIGPWR" that Power failure (System V) 从Docker您可以发送KILL信号“ SIGPWR”,表明电源故障(系统V)
docker kill --signal="SIGPWR"
and from Kubernet 和Kubernet
kubectl exec <pod> -- /killme.sh
and so scriplt killme.sh 所以scriplt killme.sh
beginning of script-----
#!/bin/bash
# Define process to find
kiperf=$(pidof iperf)
# Kills all iperf or command line
kill -30 $kiperf
script end -------------
It depends what is the purpose of your crash test. 这取决于崩溃测试的目的。 I see two options:
我看到两个选择:
You want to test if you correctly deployed Kubernetes on AWS - then, I'd terminate the related AWS EC2 Instance (or set of Instances) 您想测试是否在AWS上正确部署了Kubernetes-然后,我将终止相关的AWS EC2实例(或实例集)
You want to test if your end application is resilient to Kubernetes Node failures - then I'd just check what PODs are running on the given Node and kill them all suddenly with: 您想测试您的最终应用程序是否可以抵抗Kubernetes节点故障-然后,我只需要检查给定节点上正在运行的POD,然后突然用以下命令杀死它们:
kubectl delete pods <pod> --grace-period=0 --force
Cluster Pods do not disappear till someone (a person or a controller) destroys them, or there is an unavoidable hardware or system software error. 除非有人(一个人或一个控制器)破坏了集群Pod,或者出现不可避免的硬件或系统软件错误,否则集群Pod不会消失。
Developers call these unavoidable cases involuntary disruptions to an application. 开发人员将这些不可避免的情况称为对应用程序的非自愿中断。 Examples are:
例如:
Developers call other cases voluntary disruptions. 开发人员将其他情况称为自愿中断。 These include both actions initiated by the application owner and those initiated by a Cluster Administrator.
这些动作既包括应用程序所有者发起的动作,也包括集群管理员发起的动作。
Typical application owner actions include: 典型的应用程序所有者操作包括:
More information you can find here: kubernetes-discruption , application-discruption . 您可以在这里找到更多信息: kubernetes-discruption , application-discruption 。
You can setup Prometheus on your cluster and mesure metrics during failure. 您可以在群集上设置Prometheus,并在故障期间确保指标。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.