简体   繁体   English

AWS ECS:使用单容器主机自动扩展 EC2 自动扩展组

[英]AWS ECS: Auto-Scaling an EC2 Auto-Scaling Group with Single-Container Hosts

I have a rather interesting situation that I'm trying to figure out how to configure on AWS ECS/EC2.我有一个相当有趣的情况,我试图弄清楚如何在 AWS ECS/EC2 上进行配置。

I have a Dockerized application with the following requirements:我有一个具有以下要求的 Dockerized 应用程序:

  • Very low CPU usage (~256 CPU)非常低的 CPU 使用率(~256 CPU)
  • Moderate memory usage (~256 MB)中等内存使用(~256 MB)
  • Each container needs a public IP address that is assigned only to that container (it doesn't share it with any other container).每个容器都需要一个仅分配给该容器的公共 IP 地址(它不与任何其他容器共享)。

Fargate is not an option due to the cost, so we're looking at EC2-based solutions.由于成本原因,Fargate 不是一个选项,因此我们正在寻找基于 EC2 的解决方案。

Since the CPU and memory usage are low, and I need a unique public IP address for each container, the best option for an ECS capacity provider seems to be an EC2 auto-scaling group using the smallest instances ( t4g.nano , t3a.nano , etc.), and either the host or bridge networking mode (either mode will limit to a single container per host if I explicitly specify a static host/container port mapping).由于 CPU 和内存使用率很低,并且我需要为每个容器提供唯一的公共 IP 地址,因此 ECS 容量提供者的最佳选择似乎是使用最小实例( t4g.nanot3a.nano )的 EC2 自动扩展组等),以及hostbridge网络模式(如果我明确指定静态主机/容器端口映射,则任何一种模式都将限制为每个主机一个容器)。 This gives me a 1-to-1 mapping of hosts to containers, which is what I need.这为我提供了主机到容器的一对一映射,这正是我所需要的。

The issue is, how do I set up ECS cluster-managed autoscaling for this?问题是,如何为此设置 ECS 集群管理的自动缩放?

I've configured an EC2 auto-scaling group (Terraform):我已经配置了一个 EC2 自动扩展组(Terraform):

resource "aws_autoscaling_group" "ecs" {
  name                = "ecs"
  vpc_zone_identifier = var.subnet_ids
  min_size            = 1
  max_size            = 20
  capacity_rebalance  = true
  default_cooldown    = 0
  health_check_type   = "EC2"
  mixed_instances_policy {
    ...
  }
  instance_refresh {
    strategy = "Rolling"
  }
}

I've configured the auto-scaling group as an ECS Capacity Provider with Managed Scaling:我已将自动扩展组配置为具有托管扩展的 ECS 容量提供程序:

resource "aws_ecs_capacity_provider" "ec2" {
  name = "ec2"
  auto_scaling_group_provider {
    auto_scaling_group_arn = aws_autoscaling_group.ecs.arn
    managed_scaling {
      target_capacity           = 100
      instance_warmup_period    = 30
      minimum_scaling_step_size = 1
      maximum_scaling_step_size = aws_autoscaling_group.ecs.max_size
      status                    = "ENABLED"
    }
    managed_termination_protection = "DISABLED"
  }
}

I've configured this capacity provider as the one and only provider for the ECS cluster:我已将此容量提供程序配置为 ECS 集群的唯一提供程序:

resource "aws_ecs_cluster_capacity_providers" "this" {
  cluster_name = aws_ecs_cluster.this.name
  capacity_providers = [
    aws_ecs_capacity_provider.ec2.name
  ]
  default_capacity_provider_strategy {
    capacity_provider = aws_ecs_capacity_provider.ec2.name
    weight            = 100
    base              = 0
  }
}

I've set up an ECS service:我已经设置了 ECS 服务:

resource "aws_ecs_service" "this" {
  name            = local.task_family
  cluster         = aws_ecs_cluster.this.id
  task_definition = aws_ecs_task_definition.this.arn
  desired_count   = 1
  launch_type     = "EC2"
  lifecycle {
    ignore_changes = [desired_count]
  }
}

I've set up an App Autoscaling Target for the ECS service:我为 ECS 服务设置了 App Autoscaling Target:

resource "aws_appautoscaling_target" "ecs" {
  min_capacity       = 5
  max_capacity       = 20
  resource_id        = "service/${aws_ecs_cluster.this.name}/${aws_ecs_service.this.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

And I've set up an App Autoscaling Policy for that target:我已经为该目标设置了应用程序自动缩放策略:

resource "aws_appautoscaling_policy" "ecs_policy" {
  name               = "ecs-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.ecs.resource_id
  scalable_dimension = aws_appautoscaling_target.ecs.scalable_dimension
  service_namespace  = aws_appautoscaling_target.ecs.service_namespace

  target_tracking_scaling_policy_configuration {
    target_value       = 70
    scale_in_cooldown  = 0
    scale_out_cooldown = 0
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
  }
}

This "works" in the sense that it deploys, the service runs, and my application is functional.这在它部署、服务运行以及我的应用程序正常运行的意义上是“有效的”。 However, the scaling is not working.但是,缩放不起作用。 As you can see in the aws_autoscaling_group , I've set the minimum to 1 instance and the maximum to 20 instances.正如您在aws_autoscaling_group中看到的,我已将最小值设置为 1 个实例,将最大值设置为 20 个实例。 In the aws_appautoscaling_target , I have a minimum of 5 (would be 1 in production, but 5 for testing) and a maximum of 20 (maximum matches the max number of instances since it's 1-to-1).aws_appautoscaling_target中,我至少有 5 个(在生产中为 1,但在测试中为 5)和最多 20 个(最大值与最大实例数匹配,因为它是一对一的)。

When I deploy this, the ECS service in the AWS console shows:当我部署它时,AWS 控制台中的 ECS 服务显示:

  • Desired count: 5所需数量:5
  • Pending count: 0待定计数:0
  • Running count: 1运行次数:1

And in the events log, it says:在事件日志中,它说:

service my-service was unable to place a task because no container instance met all of its requirements. service my-service 无法放置任务,因为没有容器实例满足其所有要求。 The closest matching container-instance xyzabc1234 has insufficient memory available.最接近的匹配容器实例 xyzabc1234 的可用内存不足。

So it's trying to achieve the desired minimum number of containers (5), and it's recognizing that there are insufficient EC2 instances, but for some reason (and this is what I can't figure out) it's not scaling out the number of EC2 instances to meet the desired container count.因此,它试图实现所需的最小容器数量(5),并且它认识到 EC2 实例不足,但由于某种原因(这是我无法弄清楚的),它并没有扩展 EC2 实例的数量以满足所需的容器数量。

From AWS's documentation , it says:AWS 的文档中,它说:

When launched tasks cannot be placed on available instances, the Auto Scaling group scales-out by launching new instances.当启动的任务无法放置在可用实例上时,Auto Scaling 组会通过启动新实例进行横向扩展。 When there are running instances with no tasks, the Auto Scaling group scales-in by terminating an instance with no running tasks.当有正在运行的实例没有任务时,Auto Scaling 组通过终止没有正在运行的任务的实例来缩减。

Since the launched tasks cannot be placed on any of the available instances, it would seem that it should automatically scale out the Auto Scaling Group.由于启动的任务不能放在任何可用的实例上,它似乎应该自动扩展 Auto Scaling 组。

Any ideas on why it's failing to do so?关于为什么它没有这样做的任何想法?

I found the issue:我发现了问题:

resource "aws_ecs_service" "this" {
...
  launch_type     = "EC2"
...
}

If you specify a launch type, it will override the cluster's default capacity provider strategy, and it won't use the managed EC2 autoscaling.如果您指定启动类型,它将覆盖集群的默认容量提供程序策略,并且不会使用托管 EC2 自动扩展。 The correct approach is:正确的做法是:

resource "aws_ecs_service" "this" {
  name            = local.task_family
  cluster         = aws_ecs_cluster.this.id
  task_definition = aws_ecs_task_definition.this.arn
  desired_count   = 1

  capacity_provider_strategy {
    capacity_provider = aws_ecs_capacity_provider.ec2.name
    weight            = 100
    base              = 0
  }

  lifecycle {
    ignore_changes = [desired_count]
  }
}

If you don't provide launch_type or capacity_provider_strategy , it's supposed to use the cluster's default strategy (and it does), but Terraform shows a perpetual difference.如果您不提供launch_typecapacity_provider_strategy ,它应该使用集群的默认策略(确实如此),但 Terraform 显示出永久的差异。

After this change, everything started scaling properly!在此更改之后,一切都开始正常缩放!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM