简体繁体 English

如何替换 AWS Auto Scaling 组中的特定实例？

[英]How can I replace a specific instance in an AWS Auto Scaling Group?

原文 2020-05-15 16:50:25 6 2 amazon-web-services/ amazon-ec2/ aws-auto-scaling

I'm looking for a procedure that I can use to replace a specific instance in an AWS scalegroup, all the while maintaining AZ "balance" and not reducing capacity while waiting for a new instance to provision.我正在寻找可用于替换AWS 规模组中特定实例的过程，同时保持 AZ“平衡”并且在等待新实例供应时不会减少容量。

Occasionally, we may have reason to terminate a specific EC2 instance in a scale group, and have struggled to have an efficient procedure for doing this.有时，我们可能有理由终止规模组中的特定 EC2 实例，并且一直在努力寻找一个有效的程序来执行此操作。 I know that I can terminate the instance directly and it will be replaced, but that reduces the overall capacity of the scalegroup temporarily while waiting for a new instance to provision.我知道我可以直接终止实例并将其替换，但这会在等待新实例供应时暂时降低规模组的整体容量。 In our case this is tens of minutes as we must setup and deploy our software before the ALB can send requests在我们的例子中，这是几十分钟，因为我们必须在 ALB 发送请求之前设置和部署我们的软件

If we increase the desired_capacity by 1, we can prepare a new instance in advance - but there is no guarantee that it will be created in the same AZ as the instance we wish to terminate.如果我们将desired_capacity增加 1，我们可以提前准备一个新实例 - 但不能保证它会在与我们希望终止的实例相同的 AZ 中创建。 In addition, if I terminate the offending instance, and immediately reduce the desired_capacity will the scalegroup terminate another instance?此外，如果我终止有问题的实例，并立即减少desired_capacity ，scalegroup 是否会终止另一个实例？

So what is the best way to manage this procedure?那么管理此过程的最佳方法是什么？

2 个解决方案

You can temporarily suspend and resume specific scaling processes .您可以暂时暂停和恢复特定的缩放过程。 With this feature you can achieve the desired result in multiple ways, two of which I've described below:使用此功能，您可以通过多种方式获得所需的结果，我在下面介绍了其中两种方式：

A: Use the Auto Scaling Group's rebalance feature A：使用 Auto Scaling Group 的再平衡功能

Increase the Auto Scaling Group's desired instance count by 1 and wait for the new instance to be available将 Auto Scaling Group 的 desired instance count 增加 1 并等待新实例可用
Temporarily suspend the Launch scaling process (this prevents an automatic launch of a new instance during the next step)暂时中止Launch缩放过程（这可以防止在下一步中自动启动新实例）
Terminate the faulty instance终止故障实例
Decrease the Auto Scaling Group's desired instance count by 1 (the number of desired instances and the actual number of instances should now be in sync again)将 Auto Scaling Group 的所需实例数减少 1（所需实例数和实际实例数现在应该再次同步）
Resume the Launch scaling process.恢复Launch缩放过程。 If the remaining instances are unbalanced the Auto Scaling Group's AZRebalance process will pick this up and gradually rebalance across the AZs.如果剩余实例不平衡，Auto Scaling Group 的AZRebalance进程将选择它并逐渐在 AZ 之间重新平衡。

B: Explicitly start a new instance in the desired AZ: B：在所需的 AZ 中显式启动一个新实例：

Start a separate instance in the desired AZ在所需的 AZ 中启动一个单独的实例
Temporarily suspend the Terminate scaling process] (this prevents an automatic termination of the additional instance during the next step)暂时中止Terminate缩放过程]（这可以防止在下一步中自动终止附加实例）
Attach the instance from (1.) to the Auto Scaling Group将 (1.) 中的实例附加到 Auto Scaling 组
Terminate the original instance (the number of desired instances and the actual number of instances should now be in sync again)终止原始实例（所需实例数和实际实例数现在应该再次同步）
Resume the Terminate scaling process恢复Terminate缩放过程

Auto Scaling provides the ability to: Auto Scaling 提供以下功能：

Attach a specific instance to the Auto Scaling group (which was created outside of Auto Scaling) 将特定实例附加到 Auto Scaling 组（在 Auto Scaling 之外创建）
Detach a specific instance from the Auto Scaling group从 Auto Scaling 组中分离特定实例
Terminate a specific instance in an Auto Scaling group 终止Auto Scaling 组中的特定实例
Temporarily place an instance in an Auto Scaling group into a standby state将Auto Scaling 组中的实例临时放入备用 state

When detaching, terminating or placing in standby, the Desired Capacity of the Auto Scaling group can be automatically decremented so no replacement instance is launched, or it can be kept the same so that a replacement instance is launched.当分离、终止或置于备用状态时，Auto Scaling 组的Desired Capacity可以自动递减，因此不会启动替代实例，或者可以保持不变，以便启动替代实例。

It would generally be a good idea to have Auto Scaling launch any new instances, so that all instances are identical.让 Auto Scaling 启动任何新实例通常是个好主意，这样所有实例都是相同的。 Thus, if you are concerned about a capacity drop, then you should increment the Desired Capacity to launch a new instance, then terminate the unwanted instance from the Auto Scaling group with a capacity decrease to return the group to the previous Desired Capacity.因此，如果您担心容量下降，那么您应该增加 Desired Capacity以启动新实例，然后终止 Auto Scaling 组中不需要的实例并减少容量以使该组恢复到之前的 Desired Capacity。

You are correct that the instance launched will not be guaranteed to be in the same AZ as the one being removed.您是正确的，不能保证启动的实例与被删除的实例位于同一 AZ 中。 Auto Scaling aims to balance AZs . Auto Scaling 旨在平衡可用区。 It will launch an instance in an AZ that has the lowest number of instances.它将在实例数最少的 AZ 中启动一个实例。 Let's say there are two AZs that have an equal number of instances and you wish to remove an instance from AZ A. Incrementing the Desired Capacity might launch an instance in AZ B. Once the unwanted instance has been removed, this would mean that AZ B has two instances more than AZ A. Whether this is a problem depends upon the total number of instances in the Auto Scaling group.假设有两个具有相同数量实例的可用区，您希望从可用区 A 中删除一个实例。增加所需容量可能会在可用区 B 中启动一个实例。删除不需要的实例后，这意味着可用区 B比 AZ A 多两个实例。这是否是一个问题取决于 Auto Scaling 组中的实例总数。

The recommendation to use multiple AZs is to handle situations where an AZ might fail.建议使用多个 AZ 是为了处理 AZ 可能发生故障的情况。 Such a failure would result in a temporary loss of instances while Auto Scaling launches new instances in the remaining AZs.此类故障将导致实例暂时丢失，同时 Auto Scaling 在剩余可用区中启动新实例。 If such a drop is a concern, it is recommended to run extra instances to handle the temporary capacity drop.如果担心这种下降，建议运行额外的实例来处理临时容量下降。 Thus, returning to your Question, your Auto Scaling group should have sufficient capacity to handle one instance being removed and replaced .因此，回到您的问题，您的 Auto Scaling 组应该有足够的容量来处理一个正在删除和替换的实例。 If a temporary drop in capacity is going to impact your system, then it would be a good idea to have extra instances launched, on the assumption that instances can/will fail occasionally.如果容量的暂时下降会影响您的系统，那么启动额外的实例是个好主意，前提是实例可能/将会偶尔失败。 This will also help the rare situation in which an AZ fails, since having extra capacity would mean that the system does not immediately lose 50% of required minimum capacity.这也有助于 AZ 发生故障的罕见情况，因为拥有额外的容量意味着系统不会立即损失所需最小容量的 50%。

Bottom line: Have sufficient capacity so that temporarily replacing a bad instance should not have a significant impact on the system.底线：拥有足够的容量，以便临时替换坏实例不会对系统产生重大影响。 The concern about having an unbalanced AZ will be minor (max 2 instances different between AZs) compared to the impact of losing 50% of capacity in an AZ outage if only minimal capacity is being continually deployed.与如果仅持续部署最小容量而在可用区中断中损失 50% 的容量的影响相比，对不平衡可用区的担忧将是次要的（可用区之间最多 2 个不同的实例）。

At the end of the day, it really comes down to cost vs risk.归根结底，这实际上归结为成本与风险。 Using more than 2 AZs can reduce the impact of AZ outages.使用2 个以上的 AZ 可以减少 AZ 中断的影响。