简体   繁体   中英

AWS ECS: Auto-Scaling an EC2 Auto-Scaling Group with Single-Container Hosts

I have a rather interesting situation that I'm trying to figure out how to configure on AWS ECS/EC2.

I have a Dockerized application with the following requirements:

  • Very low CPU usage (~256 CPU)
  • Moderate memory usage (~256 MB)
  • Each container needs a public IP address that is assigned only to that container (it doesn't share it with any other container).

Fargate is not an option due to the cost, so we're looking at EC2-based solutions.

Since the CPU and memory usage are low, and I need a unique public IP address for each container, the best option for an ECS capacity provider seems to be an EC2 auto-scaling group using the smallest instances ( t4g.nano , t3a.nano , etc.), and either the host or bridge networking mode (either mode will limit to a single container per host if I explicitly specify a static host/container port mapping). This gives me a 1-to-1 mapping of hosts to containers, which is what I need.

The issue is, how do I set up ECS cluster-managed autoscaling for this?

I've configured an EC2 auto-scaling group (Terraform):

resource "aws_autoscaling_group" "ecs" {
  name                = "ecs"
  vpc_zone_identifier = var.subnet_ids
  min_size            = 1
  max_size            = 20
  capacity_rebalance  = true
  default_cooldown    = 0
  health_check_type   = "EC2"
  mixed_instances_policy {
    ...
  }
  instance_refresh {
    strategy = "Rolling"
  }
}

I've configured the auto-scaling group as an ECS Capacity Provider with Managed Scaling:

resource "aws_ecs_capacity_provider" "ec2" {
  name = "ec2"
  auto_scaling_group_provider {
    auto_scaling_group_arn = aws_autoscaling_group.ecs.arn
    managed_scaling {
      target_capacity           = 100
      instance_warmup_period    = 30
      minimum_scaling_step_size = 1
      maximum_scaling_step_size = aws_autoscaling_group.ecs.max_size
      status                    = "ENABLED"
    }
    managed_termination_protection = "DISABLED"
  }
}

I've configured this capacity provider as the one and only provider for the ECS cluster:

resource "aws_ecs_cluster_capacity_providers" "this" {
  cluster_name = aws_ecs_cluster.this.name
  capacity_providers = [
    aws_ecs_capacity_provider.ec2.name
  ]
  default_capacity_provider_strategy {
    capacity_provider = aws_ecs_capacity_provider.ec2.name
    weight            = 100
    base              = 0
  }
}

I've set up an ECS service:

resource "aws_ecs_service" "this" {
  name            = local.task_family
  cluster         = aws_ecs_cluster.this.id
  task_definition = aws_ecs_task_definition.this.arn
  desired_count   = 1
  launch_type     = "EC2"
  lifecycle {
    ignore_changes = [desired_count]
  }
}

I've set up an App Autoscaling Target for the ECS service:

resource "aws_appautoscaling_target" "ecs" {
  min_capacity       = 5
  max_capacity       = 20
  resource_id        = "service/${aws_ecs_cluster.this.name}/${aws_ecs_service.this.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

And I've set up an App Autoscaling Policy for that target:

resource "aws_appautoscaling_policy" "ecs_policy" {
  name               = "ecs-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.ecs.resource_id
  scalable_dimension = aws_appautoscaling_target.ecs.scalable_dimension
  service_namespace  = aws_appautoscaling_target.ecs.service_namespace

  target_tracking_scaling_policy_configuration {
    target_value       = 70
    scale_in_cooldown  = 0
    scale_out_cooldown = 0
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
  }
}

This "works" in the sense that it deploys, the service runs, and my application is functional. However, the scaling is not working. As you can see in the aws_autoscaling_group , I've set the minimum to 1 instance and the maximum to 20 instances. In the aws_appautoscaling_target , I have a minimum of 5 (would be 1 in production, but 5 for testing) and a maximum of 20 (maximum matches the max number of instances since it's 1-to-1).

When I deploy this, the ECS service in the AWS console shows:

  • Desired count: 5
  • Pending count: 0
  • Running count: 1

And in the events log, it says:

service my-service was unable to place a task because no container instance met all of its requirements. The closest matching container-instance xyzabc1234 has insufficient memory available.

So it's trying to achieve the desired minimum number of containers (5), and it's recognizing that there are insufficient EC2 instances, but for some reason (and this is what I can't figure out) it's not scaling out the number of EC2 instances to meet the desired container count.

From AWS's documentation , it says:

When launched tasks cannot be placed on available instances, the Auto Scaling group scales-out by launching new instances. When there are running instances with no tasks, the Auto Scaling group scales-in by terminating an instance with no running tasks.

Since the launched tasks cannot be placed on any of the available instances, it would seem that it should automatically scale out the Auto Scaling Group.

Any ideas on why it's failing to do so?

I found the issue:

resource "aws_ecs_service" "this" {
...
  launch_type     = "EC2"
...
}

If you specify a launch type, it will override the cluster's default capacity provider strategy, and it won't use the managed EC2 autoscaling. The correct approach is:

resource "aws_ecs_service" "this" {
  name            = local.task_family
  cluster         = aws_ecs_cluster.this.id
  task_definition = aws_ecs_task_definition.this.arn
  desired_count   = 1

  capacity_provider_strategy {
    capacity_provider = aws_ecs_capacity_provider.ec2.name
    weight            = 100
    base              = 0
  }

  lifecycle {
    ignore_changes = [desired_count]
  }
}

If you don't provide launch_type or capacity_provider_strategy , it's supposed to use the cluster's default strategy (and it does), but Terraform shows a perpetual difference.

After this change, everything started scaling properly!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM