简体   繁体   English

将容量提供程序与 ECS 一起使用时,需要多长时间才能删除实例?

[英]How long will it take for an instance to be removed when using capacity provider with ECS?

I have enabled capacity provider for ECS cluster and had set min, max and desired count on EC2 auto-scaling group as 1,3 and 2 respectively.我为 ECS 集群启用了容量提供程序,并将 EC2 自动缩放组的最小、最大和所需计数分别设置为 1,3 和 2。 Also I have enabled auto-scaling for ECS task with min, max and desired count as 2,6 and 2 respectively.我还为 ECS 任务启用了自动缩放,最小值、最大值和所需计数分别为 2,6 和 2。

each of these two task were launched on two separate instance when I deployed the whole setup using terraform, when the load test was run, ECS task and EC2 instance were successfully scaled-out to 6 and 3. But after the load test got completed ECS task were scaled-in but EC2-instance is still not removed.当我使用 terraform 部署整个设置时,这两个任务中的每一个都在两个单独的实例上启动,当运行负载测试时,ECS 任务和 EC2 实例成功扩展到 6 和 3。但是在负载测试完成后 ECS任务已缩小,但 EC2 实例仍未删除。

Also does the target_capacity in managed_scaling indicate the threshold used to create auto scaling policy for EC2 cluster? managed_scaling 中的 target_capacity 是否指示用于为 EC2 集群创建自动缩放策略的阈值?

 resource "aws_autoscaling_group" "asg" {
  ...
  min_size             = var.asg_min_size
  max_size             = var.asg_max_size
  desired_capacity     = var.asg_desired_capacity
  protect_from_scale_in = true
  tags = [
    {
      "key"                 = "Name"
      "value"               = local.name
      "propagate_at_launch" = true
    },
    {
      "key"                 = "AmazonECSManaged"
      "value"               = ""
      "propagate_at_launch" = true 
    }
  ]
}

resource "aws_ecs_capacity_provider" "capacity_provider" {
   name = local.name

   auto_scaling_group_provider {
      auto_scaling_group_arn         = aws_autoscaling_group.asg.arn
      managed_termination_protection = "ENABLED"

      managed_scaling {
           maximum_scaling_step_size = 4
           minimum_scaling_step_size = 1
           status                    = "ENABLED"
           target_capacity           = 70
      }
   }

  
   provisioner "local-exec" {
      when = destroy

      command = "aws ecs put-cluster-capacity-providers --cluster ${self.name} --capacity-providers [] --default-capacity-provider-strategy []"
   }
}

resource "aws_ecs_cluster" "cluster" {
  name      = local.name
  capacity_providers = [
    aws_ecs_capacity_provider.capacity_provider.name,
  ]
  tags = merge(
    {
      "Name"        = local.name,
      "Environment" = var.environment,
      "Description" = var.description,
      "Service"     = var.service,
    },
    var.tags
  )
}

You've set a target capacity of 70 for your capacity provider so the capacity provider doesn't want to go over that utilisation.您已经为容量提供程序设置了 70 的目标容量,因此容量提供程序不希望 go 超过该利用率。

When you have 2 tasks running on 2 different instances then you would have 100% utilisation as capacity is calculated by how many instances are running non daemonset tasks.当您有 2 个任务在 2 个不同的实例上运行时,您将拥有 100% 的利用率,因为容量是根据运行非 daemonset 任务的实例数量来计算的。 So if you have a target capacity of anything less than 100 it won't want to scale in to leave only non empty instances available.因此,如果您的目标容量小于 100,它就不会想要扩展以仅留下非空实例可用。

If you had a target capacity of 60 and your ASG was allowed to scale to a max of 4 instances it would still attempt to scale out to that and leave 2 empty instances because only having 1 instance available would leave the capacity at 66.6 which is higher than that lower target capacity.如果您的目标容量为 60,并且您的 ASG 被允许扩展到最多 4 个实例,它仍会尝试扩展到该数量并留下 2 个空实例,因为只有 1 个可用实例会使容量保持在 66.6,这是更高的比那个较低的目标容量。

This AWS deep dive blog post on ECS capacity providers is a good read if you are just starting to use capacity providers.如果您刚刚开始使用容量提供程序,那么这篇关于 ECS 容量提供程序的 AWS 深入研究博客文章是一本不错的读物。 I think I must have read it about a dozen times when I started to use them and I still find the scaling mechanism slightly unusual.我想当我开始使用它们时我一定已经读了十几遍了,但我仍然觉得缩放机制有点不寻常。 It does solve some key issues for us around naive ASG scaling based on ECS memory/CPU reservation though.不过,它确实为我们解决了一些基于 ECS 内存/CPU 预留的简单 ASG 缩放的关键问题。 It also allows scale to zero if you don't want any spare capacity (eg set a target capacity of 100) and don't mind waiting for instances to scale out.如果您不想要任何备用容量(例如,将目标容量设置为 100)并且不介意等待实例横向扩展,它还允许扩展为零。

  1. Ensure that the Capacity provider and its backing Auto Scaling are both marked as Scale In protected and all running instances are marked that way as well if they've been created before the Auto Scaling parameter's change确保容量提供程序及其支持的 Auto Scaling 都标记为受保护的缩减,并且所有正在运行的实例都以这种方式标记,如果它们是在 Auto Scaling 参数更改之前创建的
  2. Decrease the size of the Autoscaling Group.减小自动缩放组的大小。 This by itself will not cause any change.这本身不会引起任何变化。 However, if any instances are not Scale In protected, they would get immediately dropped and the tasks running on them killed但是,如果任何实例不受 Scale In 保护,它们将立即被丢弃并且在它们上运行的任务被终止
  3. In the Cluster Instance tab, mark an instance with the lowest load and set it to drain.在 Cluster Instance 选项卡中,标记负载最低的实例并将其设置为 drain。 This will initiate a gradual move of tasks off that instance and eventually its termination as soon as it is free of running tasks这将开始逐步将任务移出该实例,并最终在没有正在运行的任务时立即终止

NOTE: If your Cluster was created via a Template and assigned some capacity, you may need to trace through the Alarms that preserve that capacity if drained instances "reappear" and adjust everything to the numbers you really need注意:如果您的集群是通过模板创建的并分配了一些容量,您可能需要在耗尽的实例“重新出现”时跟踪保留该容量的警报并将所有内容调整为您真正需要的数字

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM