简体繁体 English

更新AWS ECS服务

[英]Updating an AWS ECS Service

原文 2017-01-14 22:30:43 0 2 amazon-web-services/ deployment/ amazon-ecs

I have a service running on AWS EC2 Container Service (ECS). 我有一个在AWS EC2容器服务（ECS）上运行的服务。 My setup is a relatively simple one. 我的设置相对简单。 It operates with a single task definition and the following details: 它使用单个任务定义和以下详细信息进行操作：

Desired capacity set at 2 所需容量设置为2
Minimum healthy set at 50% 最低健康状况定为50％
Maximum available set at 200% 最大可用设置为200％
Tasks run with 80% CPU and memory reservations 任务以80％的CPU和内存预留运行

Initially, I am able to get the necessary EC2 instances registered to the cluster that holds the service without a problem. 最初，我能够在没有问题的情况下获得注册到集群的必要EC2实例。 The associated task then starts running on the two instances. 然后，关联的任务开始在两个实例上运行。 As expected – given the CPU and memory reservations – the tasks take up almost the entirety of the EC2 instances' resources. 正如预期的那样 - 考虑到CPU和内存预留 - 这些任务几乎占用了整个EC2实例的资源。

Sometimes, I want the task to use a new version of the application it is running. 有时，我希望任务使用它正在运行的应用程序的新版本。 In order to make this happen, I create a revision of the task, de-register the previous revision, and then update the service. 为了实现这一点，我创建了任务的修订版，取消注册了以前的版本，然后更新了服务。 Note that I have set the minimum healthy percentage to require 2 * 0.50 = 1 instance running at all times and the maximum healthy percentage to permit up to 2 * 2.00 = 4 instances running. 请注意，我已将最低健康百分比设置为要求始终运行2 * 0.50 = 1实例，并且允许最大健康百分比允许最多2 * 2.00 = 4实例运行。

Accordingly, I expected 1 of the de-registered task instances to be drained and taken offline so that 1 instance of the new revision of the task could be brought online. 因此，我预计将取消注册的任务实例中的一个被排空并脱机，以便可以使该任务的新修订版的一个实例联机。 Then the process would repeat itself, bringing the deployment to a successful state. 然后，该过程将重复，使部署成功。

Unfortunately, the cluster does nothing. 不幸的是，集群什么也没做。 In the events log, it tells me that it cannot place the new tasks, even though the process I have described above would permit it to do so. 在events日志中，它告诉我它不能放置新任务，即使我上面描述的过程允许它这样做。

How can I get the cluster to perform the behavior that I am expecting? 如何让集群执行我期望的行为？ I have only been able to get it to do so when I manually register another EC2 instance to the cluster and then tear it down after the update is complete (which is not desirable). 当我手动将另一个EC2实例注册到集群然后在更新完成后将其拆除（我不希望这样做）时，我才能够这样做。

2 个解决方案

I have faced the same issue where the tasks used to get stuck and had no space to place them. 我遇到了同样的问题，其中任务被卡住并且没有空间放置它们。 Below snippet from AWS doc on updating a service helped me to make the below decision. 以下来自AWS doc的更新服务片段帮助我做出以下决定。

If your service has a desired number of four tasks and a maximum percent value of 200%, the scheduler may start four new tasks before stopping the four older tasks (provided that the cluster resources required to do this are available). 如果您的服务具有所需数量的四个任务且最大百分比值为200％，则调度程序可以在停止四个较旧任务之前启动四个新任务（前提是执行此操作所需的群集资源可用）。 The default value for maximum percent is 200%. 最大百分比的默认值为200％。

We should have the cluster resources available / container instances available to have the new tasks get started so they can start and the older one can drain. 我们应该有可用的集群资源/容器实例来启动新任务，以便它们可以启动，旧的任务可以耗尽。

These are the things i do 这些是我做的事情

Before doing a service update add like 20% capacity to your cluster. 在执行服务更新之前，请为群集添加20％的容量。 You can use the ASG (Autoscaling group) commandline and from the desired capacity add 20% to your cluster. 您可以使用ASG（自动扩展组）命令行，并从所需的容量中将20％添加到您的群集。 This way you will have some additional instance during deployment. 这样，您将在部署期间获得一些额外的实例。
Once you have the instance the new tasks will start spinning up quickly and the older one will start draining. 一旦你有了实例，新任务将很快开始旋转，旧任务将开始耗尽。

But does this mean i will have extra container instances ? 但这是否意味着我将有额外的容器实例？

Yes, during the deployment you will add some instances but as the older tasks drain they will hang around. 是的，在部署期间，您将添加一些实例，但随着较旧的任务耗尽，它们将会闲置。 The way to remove them is 删除它们的方法是

Create a MemoryReservationLow alarm (~70% threshold in your case) for like 25 mins (longer duration to be sure that we have over commissioned). 创建一个MemoryReservationLow警报（在您的情况下约为70％阈值），持续25分钟（持续时间更长，以确保我们已经过度调试）。 As the reservation will go low once you have those extra server not being used they can be removed. 一旦你没有使用那些额外的服务器，预订将变低，他们可以被删除。

I have seen this before. 我之前见过这个。 If your port mapping is attempting to map a static host port to the container within the task, you need more cluster instances. 如果端口映射正在尝试将静态主机端口映射到任务中的容器，则需要更多群集实例。

Also this could be because there is not enough available memory to meet the memory (soft or hard) limit requested by the container within the task. 这也可能是因为没有足够的可用内存来满足任务中容器请求的内存（软或硬）限制。