如何在Kubernetes中模拟电源故障

Question

I have my rook-ceph cluster running on AWS . 我的rook-ceph集群在AWS运行。 Its loaded up with data. 它加载了数据。 Is there's any way to stimulate POWER FAILURE so that I can test the behaviour of my cluster?. 有什么方法可以激发电源故障，以便我可以测试集群的行为？

Answer 1

From Docker you can send KILL signal "SIGPWR" that Power failure (System V) 从Docker您可以发送KILL信号“ SIGPWR”，表明电源故障（系统V）

docker kill --signal="SIGPWR"

and from Kubernet 和Kubernet

kubectl exec <pod> -- /killme.sh

and so scriplt killme.sh 所以scriplt killme.sh

beginning of script-----
#!/bin/bash
# Define process to find
kiperf=$(pidof iperf)
# Kills all iperf or command line
kill -30 $kiperf
script end -------------

signal 30 you can find here 信号30你可以在这里找到

Answer 2

It depends what is the purpose of your crash test. 这取决于崩溃测试的目的。 I see two options: 我看到两个选择：

You want to test if you correctly deployed Kubernetes on AWS - then, I'd terminate the related AWS EC2 Instance (or set of Instances) 您想测试是否在AWS上正确部署了Kubernetes-然后，我将终止相关的AWS EC2实例（或实例集）
You want to test if your end application is resilient to Kubernetes Node failures - then I'd just check what PODs are running on the given Node and kill them all suddenly with: 您想测试您的最终应用程序是否可以抵抗Kubernetes节点故障-然后，我只需要检查给定节点上正在运行的POD，然后突然用以下命令杀死它们：

kubectl delete pods <pod> --grace-period=0 --force

Answer 3

Cluster Pods do not disappear till someone (a person or a controller) destroys them, or there is an unavoidable hardware or system software error. 除非有人（一个人或一个控制器）破坏了集群Pod，或者出现不可避免的硬件或系统软件错误，否则集群Pod不会消失。

Developers call these unavoidable cases involuntary disruptions to an application. 开发人员将这些不可避免的情况称为对应用程序的非自愿中断。 Examples are: 例如：

a hardware failure of the physical machine backing the node 支持该节点的物理机的硬件故障
cluster administrator deletes VM (instance) by mistake 集群管理员误删除虚拟机（实例）
cloud provider or hypervisor failure makes VM disappear a kernel panic 云提供商或虚拟机管理程序故障使VM消失，从而导致内核崩溃
the node disappears from the cluster due to cluster network partition 由于群集网络分区，该节点从群集中消失了
eviction of a pod due to the node being out-of-resources. 由于节点资源不足而将容器逐出。 Except for the out-of-resources condition, all these conditions should be familiar to most users; 除了资源不足的情况外，所有用户都应该熟悉所有这些条件； they are not specific to Kubernetes. 它们并不特定于Kubernetes。

Developers call other cases voluntary disruptions. 开发人员将其他情况称为自愿中断。 These include both actions initiated by the application owner and those initiated by a Cluster Administrator. 这些动作既包括应用程序所有者发起的动作，也包括集群管理员发起的动作。

Typical application owner actions include: 典型的应用程序所有者操作包括：

deleting the deployment or other controller that manages the pod 删除管理吊舱的部署或其他控制器
updating a deployment's pod template causing a restart 更新部署的Pod模板导致重启
directly deleting a pod (eg by accident) 直接删除广告连播（例如，意外删除）

More information you can find here: kubernetes-discruption , application-discruption . 您可以在这里找到更多信息： kubernetes-discruption ， application-discruption 。

You can setup Prometheus on your cluster and mesure metrics during failure. 您可以在群集上设置Prometheus，并在故障期间确保指标。

如何在Kubernetes中模拟电源故障

问题描述

3 个解决方案

解决方案1
0 2019-07-01 20:35:24

解决方案2
0 2019-07-02 14:29:08

解决方案3
0 2019-07-03 08:32:29

如何在Kubernetes中模拟电源故障

问题描述

3 个解决方案

解决方案1 0 2019-07-01 20:35:24

解决方案2 0 2019-07-02 14:29:08

解决方案3 0 2019-07-03 08:32:29

解决方案1
0 2019-07-01 20:35:24

解决方案2
0 2019-07-02 14:29:08

解决方案3
0 2019-07-03 08:32:29