简体   繁体   English

如何在 ECS 中自动扩展服务器?

[英]How to autoscale Servers in ECS?

I recently started using ECS.我最近开始使用 ECS。 I was able to deploy a container image in ECR and create task definition for my container with CPU/Memory limits.我能够在 ECR 中部署容器映像,并为具有 CPU/内存限制的容器创建任务定义。 My use case is that each container will be a long running app (no webserver, no port mapping needed).我的用例是每个容器都将是一个长时间运行的应用程序(不需要网络服务器,不需要端口映射)。 The containers will be spawned on demand 1 at a time and deleted on demand 1 at a time.容器将根据需求 1 一次生成,并根据需求 1 一次删除。

I am able to create a cluster with N server instances.我能够创建一个包含 N 个服务器实例的集群。 But I'd like to be able for the server instances to automatically scale up/down.但我希望能够让服务器实例自动向上/向下扩展。 For example if there isn't enough CPU/Memory in the cluster, I'd like a new instance to be created.例如,如果集群中没有足够的 CPU/内存,我希望创建一个新实例。

And if there is an instance with no containers running in it, I'd like that specific instance to be scaled down / deleted.如果有一个没有运行容器的实例,我希望缩小/删除该特定实例。 This is to avoid auto scale down termination of a server instance that has running tasks in it.这是为了避免在其中运行任务的服务器实例自动缩减终止。

What steps are needed to be able to achieve this?需要采取哪些步骤才能实现这一目标?

Considering that you already have an ECS Cluster created, AWS provides instructions on Scaling cluster instances with CloudWatch Alarms .考虑到您已经创建了 ECS 集群,AWS 提供了有关使用 CloudWatch 警报扩展集群实例的说明。

Assuming that you want to scale the cluster based on the memory reservation, at a high level, you would need to do the following:假设您想根据内存预留扩展集群,在较高级别上,您需要执行以下操作:

  1. Create an Launch Configuration for your Auto Scaling Group.为您的 Auto Scaling 组创建启动配置。 This这个
  2. Create an Auto Scaling Group, so that the size of the cluster can be scaled up and down.创建一个 Auto Scaling Group,以便集群的大小可以扩展和缩减。
  3. Create a CloudWatch Alarm to scale the cluster up if the memory reservation is over 70%如果内存预留超过 70%,则创建 CloudWatch 警报以扩展集群
  4. Create a CloudWatch Alarm to scale the cluster down if the memory reservation is under 30%如果内存预留低于 30%,则创建 CloudWatch 警报以缩减集群规模

Because it's more of my specialty I wrote up an example CloudFormation template that should get you started for most of this:因为它更像是我的专长,所以我编写了一个示例CloudFormation模板,它应该可以帮助您开始大部分工作:

Parameters:
  MinInstances:
    Type: Number
  MaxInstances:
    Type: Number
  InstanceType:
    Type: String
    AllowedValues:
      - t2.nano
      - t2.micro
      - t2.small
      - t2.medium
      - t2.large
  VpcSubnetIds:
    Type: String

Mappings:
  EcsInstanceAmis:
    us-east-2:
      Ami: ami-1c002379
    us-east-1:
      Ami: ami-9eb4b1e5
    us-west-2:
      Ami: ami-1d668865
    us-west-1:
      Ami: ami-4a2c192a
    eu-west-2:
      Ami: ami-cb1101af
    eu-west-1:
      Ami: ami-8fcc32f6
    eu-central-1:
      Ami: ami-0460cb6b
    ap-northeast-1:
      Ami: ami-b743bed1
    ap-southeast-2:
      Ami: ami-c1a6bda2
    ap-southeast-1:
      Ami: ami-9d1f7efe
    ca-central-1:
      Ami: ami-b677c9d2

Resources:
  Cluster:
    Type: AWS::ECS::Cluster
  Role:
    Type: AWS::IAM::Role
    Properties:
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          -
            Effect: Allow
            Action:
              - sts:AssumeRole
            Principal:
              Service:
                - ec2.amazonaws.com    
  InstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Path: /
      Roles:
        - !Ref Role    
  LaunchConfiguration:
    Type: AWS::AutoScaling::LaunchConfiguration
    Properties:
      ImageId: !FindInMap [EcsInstanceAmis, !Ref "AWS::Region", Ami]
      InstanceType: !Ref InstanceType
      IamInstanceProfile: !Ref InstanceProfile
      UserData:
        Fn::Base64: !Sub |
          #!/bin/bash
          echo ECS_CLUSTER=${Cluster} >> /etc/ecs/ecs.config  
  AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      MinSize: !Ref MinInstances
      MaxSize: !Ref MaxInstances
      LaunchConfigurationName: !Ref LaunchConfiguration
      HealthCheckGracePeriod: 300
      HealthCheckType: EC2
      VPCZoneIdentifier: !Split [",", !Ref VpcSubnetIds]
    ScaleUpPolicy:
      Type: AWS::AutoScaling::ScalingPolicy
      Properties:
        AdjustmentType: ChangeInCapacity
        AutoScalingGroupName: !Ref AutoScalingGroup
        Cooldown: '1'
        ScalingAdjustment: '1'
    MemoryReservationAlarmHigh:
      Type: AWS::CloudWatch::Alarm
      Properties:
        EvaluationPeriods: '2'
        Statistic: Average
        Threshold: '70'
        AlarmDescription: Alarm if Cluster Memory Reservation is to high
        Period: '60'
        AlarmActions:
        - Ref: ScaleUpPolicy
        Namespace: AWS/ECS
        Dimensions:
        - Name: ClusterName
          Value: !Ref Cluster
        ComparisonOperator: GreaterThanThreshold
        MetricName: MemoryReservation
    ScaleDownPolicy:
      Type: AWS::AutoScaling::ScalingPolicy
      Properties:
        AdjustmentType: ChangeInCapacity
        AutoScalingGroupName: !Ref AutoScalingGroup
        Cooldown: '1'
        ScalingAdjustment: '-1'
    MemoryReservationAlarmLow:
      Type: AWS::CloudWatch::Alarm
      Properties:
        EvaluationPeriods: '2'
        Statistic: Average
        Threshold: '30'
        AlarmDescription: Alarm if Cluster Memory Reservation is to Low
        Period: '60'
        AlarmActions:
        - Ref: ScaleDownPolicy
        Namespace: AWS/ECS
        Dimensions:
        - Name: ClusterName
          Value: !Ref Cluster
        ComparisonOperator: LessThanThreshold
        MetricName: MemoryReservation

This creates an ECS Cluster, a Launch Configuration, An AutoScaling Group, As well as the Alarms based on the ECS Memory Reservation.这将创建一个 ECS 集群、一个启动配置、一个 AutoScaling 组以及基于 ECS 内存预留的警报。

Now we can get to the interesting discussions.现在我们可以开始有趣的讨论了。

Why can't we scale up based on the CPU Utilization And Memory Reservation?为什么我们不能根据 CPU 利用率内存预留进行纵向扩展?

The short answer is you totally can But you're likely to pay a lot for it.简短的回答是你完全可以,你可能会为此付出很多。 EC2 has a known property that when you create an instance, you pay for a minimum of 1 hour, because partial instance hours are charged as full hours. EC2 有一个已知属性,即在您创建实例时,您至少需要支付 1 小时的费用,因为部分实例小时数按完整小时数收费。 Why that's relevant is, imagine you have multiple alarms.为什么这是相关的,假设您有多个警报。 Say you have a bunch of services that are currently running idle, and you fill the cluster.假设您有一堆当前处于空闲状态的服务,并且您填满了集群。 Either the CPU Alarm scales down the cluster, or the Memory Alarm scales up the cluster. CPU 警报会缩小集群规模,或者内存警报会扩大集群规模。 One of these will likely scale the cluster to the point that it's alarm is no longer triggered.其中之一可能会将集群扩展到不再触发警报的程度。 After the cooldown, period, the other alarm will undo it's last action, After the next cooldown, the action will likely be redone.冷却时间结束后,另一个警报将撤消其上一次操作,下一次冷却后,该操作可能会重做。 Thus instances are created then destroyed repeatedly on every other cooldown.因此,实例被创建然后在每隔一个冷却时间重复销毁。

After giving a bunch of thought to this, the strategy that I came up with was to use Application Autoscaling for ECS Services based on CPU Utilization, and Memory Reservation based on the cluster.想了想,我想到的策略是基于CPU利用率的ECS服务应用自动伸缩,基于集群的内存预留。 So if one service is running hot, an extra task will be added to share the load.因此,如果一项服务正在运行,则会添加一个额外的任务来分担负载。 This will slowly fill the cluster memory reservation capacity.这将慢慢填满集群内存预留容量。 When the memory gets full, the cluster scales up.当内存已满时,集群会向上扩展。 When a service is cooling down, the services will start shutting down tasks.当服务冷却时,服务将开始关闭任务。 As the memory reservation on the cluster drops, the cluster will be scaled down.随着集群上的内存预留下降,集群将缩小。

The thresholds for the CloudWatch Alarms might need to be experimented with, based on your task definitions.根据您的任务定义,可能需要试验 CloudWatch 警报的阈值。 The reason for this is that if you put the scale up threshold too high, it may not scale up as the memory gets consumed, and then when autoscaling goes to place another task, it will find that there isn't enough memory available on any instance in the cluster, and therefore be unable to place another task.这样做的原因是,如果你把向上扩展的阈值设置得太高,它可能不会随着内存的消耗而向上扩展,然后当自动缩放去放置另一个任务时,它会发现任何一个都没有足够的可用内存实例在集群中,因此无法放置另一个任务。

As part of this year's re:Invent conference, AWS announced cluster auto scaling for Amazon ECS .作为今年re:Invent会议的一部分,AWS 宣布了适用于 Amazon ECS 的集群自动扩展 Clusters configured with auto scaling can now add more capacity when needed and remove capacity that is not necessary.配置了 Auto Scaling 的集群现在可以在需要时添加更多容量并删除不需要的容量。 You can find more information about this in the documentation .您可以在文档中找到更多相关信息。

However, depending on what you're trying to run, AWS Fargate could be a better option.但是,根据您尝试运行的内容, AWS Fargate可能是更好的选择。 Fargate allows you to run containers without provisioning and managing the underlying infrastructure; Fargate 允许您在不配置和管理底层基础设施的情况下运行容器; ie, you don't have to deal with any EC2 instances.即,您不必处理任何 EC2 实例。 With Fargate, you can make an API call to run your container, the container can run, and then there's nothing to clean up once the container stops running.使用 Fargate,您可以调用 API 来运行您的容器,容器可以运行,然后一旦容器停止运行,就无需清理任何内容。 Fargate is billed per-second (with a 1-minute minimum) and is priced based on the amount of CPU and memory allocated (see here for details). Fargate 按秒计费(最少 1 分钟),并根据分配的 CPU 和内存量定价(有关详细信息,请参见此处)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM