简体   繁体   English

CloudFormation AutoScalingGroup不等待更新/扩展时的信号

[英]CloudFormation AutoScalingGroup not waiting for signal on update/scale-up

I'm working with a CloudFormation template that brings up as many instances as I request, and want to wait for them to finish initialising (via User Data) before the stack creation/update is considered complete. 我正在使用CloudFormation模板,该模板会根据我的请求调出尽可能多的实例,并希望等到它们完成初始化(通过用户数据),然后才能认为堆栈创建/更新已完成。

The Expectation 期望

Creating or updating the stack should wait for signals from all newly created instances, such to ensure that their initialisation is complete. 创建或更新堆栈应等待来自所有新创建的实例的信号,以确保其初始化完成。

I don't want the stack creation or update to be considered successful if any of the created instances fail to initialise. 如果任何创建的实例无法初始化,我不希望将堆栈创建或更新视为成功。

The Reality 现实

CloudFormation only seems to wait for signals from instances when the stack is first created. CloudFormation似乎只在等待首次创建堆栈时的实例信号。 Updating the stack and increasing the number of instances seems to disregard signalling. 更新堆栈并增加实例数似乎忽略了信令。 The update operation finishes successfully very quickly, whilst instances are still being initialised. 更新操作非常快速地成功完成,而实例仍在初始化。

Instances created as a result of updating the stack can fail to initialise, but the update action would've already been considered a success. 由于更新堆栈而创建的实例可能无法初始化,但更新操作已经被认为是成功的。

The Question 问题

Using CloudFormation, how can I make the reality meet the expectation? 使用CloudFormation,我如何才能使现实符合预期?

I want the same behaviour that applies when the stack is created, to when the stack is updated. 我想要在创建堆栈时,以及更新堆栈时应用的相同行为。

Similar Questions 类似的问题

I have found only the following question that matches my problem: UpdatePolicy in Autoscaling group not working correctly for AWS CloudFormation update 我发现只有以下问题符合我的问题: Autoscaling组中的UpdatePolicy无法正常用于AWS CloudFormation更新

It's been open for a year and has not received an answer. 它已经开放一年,但没有得到答案。

I'm creating another question as I've more information to add, and I'm not sure if these particulars will match those of the author in that question. 我正在创建另一个问题,因为我需要添加更多信息,而且我不确定这些细节是否与该问题中的作者相匹配。

Reproducing 再现

To demonstrate the problem, I've created a template based off of the example beneath the Auto Scaling Group header on this AWS documentation page , which includes signalling. 为了演示此问题,我在此AWS文档页面上Auto Scaling Group标题下创建了一个模板,其中包括信令。

The created template has been adapted as so: 创建的模板已经过调整,如下所示:

  • It uses an Ubuntu AMI (in region ap-northeast-1 ). 它使用Ubuntu AMI(在区域ap-northeast-1 )。 The cfn-signal command has been bootstrapped and called as necessary considering this change. cfn-signal命令已经过引导,并在考虑到此更改时根据需要进行调用。
  • A new parameter dictates how many instances to launch in the auto scaling group. 新参数指示在自动缩放组中启动的实例数。
  • A sleep time of 2 minutes has been added before signalling, to simulate the time spent whilst initialising. 在发信号之前添加了2分钟的睡眠时间,以模拟初始化时花费的时间。

Here's the template, saved to template.yml : 这是模板,保存到template.yml

Parameters:
  DesiredCapacity:
    Type: Number
    Description: How many instances would you like in the Auto Scaling Group?

Resources:
  AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      AvailabilityZones: !GetAZs ''
      LaunchConfigurationName: !Ref LaunchConfig
      MinSize: !Ref DesiredCapacity
      MaxSize: !Ref DesiredCapacity
    CreationPolicy:
      ResourceSignal:
        Count: !Ref DesiredCapacity
        Timeout: PT5M
    UpdatePolicy:
      AutoScalingScheduledAction:
        IgnoreUnmodifiedGroupSizeProperties: true
      AutoScalingRollingUpdate:
        MinInstancesInService: 1
        MaxBatchSize: 2
        PauseTime: PT5M
        WaitOnResourceSignals: true

  LaunchConfig:
    Type: AWS::AutoScaling::LaunchConfiguration
    Properties:
      ImageId: ami-b7d829d6
      InstanceType: t2.micro
      UserData:
        'Fn::Base64':
          !Sub |
            #!/bin/bash -xe
            sleep 120

            apt-get -y install python-setuptools
            TMP=`mktemp -d`
            curl https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz | \
              tar xz -C $TMP --strip-components 1
            easy_install $TMP

            /usr/local/bin/cfn-signal -e $? \
              --stack ${AWS::StackName} \
              --resource AutoScalingGroup \
              --region ${AWS::Region}

Now I create the stack with a single instance, via: 现在我用一个实例创建堆栈,通过:

$ aws cloudformation create-stack \
  --region=ap-northeast-1 \
  --stack-name=asg-test \
  --template-body=file://template.yml \
  --parameters ParameterKey=DesiredCapacity,ParameterValue=1

After waiting a few minutes for the creation to complete, let's look some key stack events: 在等待几分钟完成创建之后,让我们看看一些关键的堆栈事件:

$ aws cloudformation describe-stack-events \
  --region=ap-northeast-1 \
  --stack-name=asg-test

    ...
    {
        "Timestamp": "2017-02-03T05:36:45.445Z",
        ...
        "LogicalResourceId": "AutoScalingGroup",
        ...
        "ResourceStatus": "CREATE_COMPLETE",
        ...
    },
    {
        "Timestamp": "2017-02-03T05:36:42.487Z",
        ...
        "LogicalResourceId": "AutoScalingGroup",
        ...
        "ResourceStatusReason": "Received SUCCESS signal with UniqueId ...",
        "ResourceStatus": "CREATE_IN_PROGRESS"
    },
    {
        "Timestamp": "2017-02-03T05:33:33.274Z",
        ...
        "LogicalResourceId": "AutoScalingGroup",
        ...
        "ResourceStatusReason": "Resource creation Initiated",
        "ResourceStatus": "CREATE_IN_PROGRESS",
        ...
    }
    ...

You can see that the auto scaling group started initiating at 05:33:33. 你可以看到自动缩放组在05:33:33开始启动。 At 05:36:42 (3 minutes after initiation), it received a success signal. 在05:36:42(启动后3分钟),它收到了成功信号。 This allowed the auto scaling group to reach its own success status only moments after, at 05:36:45. 这使得自动缩放组仅在05:36:45之后才能达到自己的成功状态。

That's awesome - working like a charm. 这太棒了 - 像魅力一样工作。

Now let's try increasing the number of instances in this auto scaling group to 2 by updating the stack: 现在让我们尝试通过更新堆栈将此自动缩放组中的实例数增加到2:

$ aws cloudformation update-stack \
  --region=ap-northeast-1 \
  --stack-name=asg-test \
  --template-body=file://template.yml \
  --parameters ParameterKey=DesiredCapacity,ParameterValue=2

After waiting a much shorter time for the update to complete, let's look at some of the new stack events: 在等待更短的时间完成更新之后,让我们看看一些新的堆栈事件:

$ aws cloudformation describe-stack-events \
  --region=ap-northeast-1 \
  --stack-name=asg-test

    {
        "ResourceStatus": "UPDATE_COMPLETE",
        ...
        "ResourceType": "AWS::CloudFormation::Stack",
        ...
        "Timestamp": "2017-02-03T05:45:47.063Z"
    },
    ...
    {
        "ResourceStatus": "UPDATE_COMPLETE",
        ...
        "LogicalResourceId": "AutoScalingGroup",
        "Timestamp": "2017-02-03T05:45:43.047Z"
    },
    {
        "ResourceStatus": "UPDATE_IN_PROGRESS",
        ...,
        "LogicalResourceId": "AutoScalingGroup",
        "Timestamp": "2017-02-03T05:44:20.845Z"
    },
    {
        "ResourceStatus": "UPDATE_IN_PROGRESS",
        ...
        "ResourceType": "AWS::CloudFormation::Stack",
        ...
        "Timestamp": "2017-02-03T05:44:15.671Z",
        "ResourceStatusReason": "User Initiated"
    },
    ....

Now you can see that whilst the auto scaling group started updating at 05:44:20, it completed at 05:45:43 - that's less than one and a half minutes to completion, which shouldn't be possible considering a sleep time of 120 seconds in the user data. 现在你可以看到,虽然自动缩放组在05:44:20开始更新,但它在05:45:43完成 - 完成时间不到一分半钟,考虑到睡眠时间不可能用户数据120秒。

The stack update then proceeds to completion without the auto scaling group ever having received any signals. 然后堆栈更新进行到完成,而自动缩放组没有接收到任何信号。

The new instance does indeed exist. 新实例确实存在。

In my real use case I've SSHed into one of these new instances to find that it was still in the process of initialising even after the stack update completed. 在我的实际用例中,我已经连接到其中一个新实例,发现即使在堆栈更新完成后它仍处于初始化过程中。

What I've Tried 我试过的

I've read and re-read the documentation surrounding CreationPolicy and UpdatePolicy , but have failed to identify what I'm missing. 我已经阅读并重新阅读了有关CreationPolicyUpdatePolicy的文档,但未能确定我缺少的内容。

Taking a look at the update policy in use above, I don't understand what it's actually doing. 看一下上面使用的更新策略,我不明白它实际上在做什么。 Why is WaitOnResourceSignals true, but it's not waiting? 为什么WaitOnResourceSignals true,但它不等待? Is it serving some other purpose? 它是否有其他用途?

Or are these new instances not falling under the "rolling update" policy? 或者这些新实例是否属于“滚动更新”政策? If they don't belong there, then I'd expect them to fall under the creation policy, but that doesn't seem to apply either. 如果他们不属于那里,那么我希望他们属于创作政策,但这似乎也不适用。

As such, I don't really know what else to try. 因此,我真的不知道还有什么可以尝试。

I have a sneaking feeling that it's functioning as designed/expected, but if it is then what's the point of that WaitOnResourceSignals property and how can I meet the expectation set above? 我有一种偷偷摸摸的感觉,它的功能与设计/预期一致,但如果它是那么WaitOnResourceSignals属性的重点是什么,我怎样才能满足上面设定的期望?

The AutoScalingRollingUpdate policy handles rotating out an entire set of instances in an Auto Scaling group in response to changes to the underlying LaunchConfiguration . AutoScalingRollingUpdate策略处理旋转Auto Scaling组中的整个实例集,以响应对基础LaunchConfiguration更改。 It doesn't apply to individual changes to the number of instances in the existing group. 它不适用于对现有组中实例数的单独更改。 According to the UpdatePolicy Attribute documentation, 根据UpdatePolicy属性文档,

The AutoScalingReplacingUpdate and AutoScalingRollingUpdate policies apply only when you do one or more of the following: 仅当您执行以下一项或多项操作时, AutoScalingReplacingUpdateAutoScalingRollingUpdate策略才适用:

  • Change the Auto Scaling group's AWS::AutoScaling::LaunchConfiguration . 更改Auto Scaling组的AWS::AutoScaling::LaunchConfiguration
  • Change the Auto Scaling group's VPCZoneIdentifier property 更改Auto Scaling组的VPCZoneIdentifier属性
  • Update an Auto Scaling group that contains instances that don't match the current LaunchConfiguration . 更新包含与当前LaunchConfiguration不匹配的实例的Auto Scaling组。

Changing the Auto Scaling group's DesiredCapacity property is not in this list, so the AutoScalingRollingUpdate policy does not apply to this type of change. 更改Auto Scaling组的DesiredCapacity属性不在此列表中,因此AutoScalingRollingUpdate策略不适用于此类更改。

As far as I know, it is not possible (using standard AWS CloudFormation resources) to delay the completion of a Stack Update modifying DesiredCapacity until any new instances added to the Auto Scaling Group are fully provisioned. 据我所知,在完全配置添加到Auto Scaling组的任何新实例之前,不可能(使用标准AWS CloudFormation资源)延迟完成修改DesiredCapacity的堆栈更新。

Here are some alternative options: 以下是一些备选方案:

  1. Instead of modifying only DesiredCapacity , modify a LaunchConfiguration property at the same time. 而不是仅修改DesiredCapacity ,同时修改LaunchConfiguration属性。 This will trigger an AutoScalingRollingUpdate to the desired capacity (the downside is that it will also update existing instances, which may not actually need to be modified). 这将触发AutoScalingRollingUpdate到所需的容量(缺点是它还将更新现有实例,实际上可能不需要修改)。
  2. Add an AWS::AutoScaling::LifecycleHook resource to your Auto Scaling Group, and call aws autoscaling complete-lifecycle-action in addition to cfn-signal , to signal lifecycle-hook completion. AWS::AutoScaling::LifecycleHook资源添加到Auto Scaling组,并在cfn-signal之外调用aws autoscaling complete-lifecycle-action ,以指示生命周期钩子完成。 This won't delay your CloudFormation stack update as desired, but it will delay the individual auto-scaled instances from entering the InService state until the lifecycle signal is received. 这不会延迟您的CloudFormation堆栈更新,但它延迟各个自动扩展的实例进入InService状态,直到收到生命周期信号。 (See Lifecycle Hooks documentation for more info.) (有关详细信息,请参阅Lifecycle Hooks文档。)
  3. As an extension to #2, it should be possible to add a Lifecycle Hook to your Auto Scaling group, as well as a Custom Resource that polls your Auto Scaling Group and only completes when the Auto Scaling group contains the DesiredCapacity number of instances all in the InService state. 作为#2的扩展,应该可以向Auto Scaling组添加Lifecycle Hook,以及轮询Auto Scaling组的自定义资源 ,并且仅在Auto Scaling组包含所有实例的DesiredCapacity数时完成InService状态。

the rolling update only works for existing instances. 滚动更新仅适用于现有实例。 The documentation says: 文件说:

Rolling updates enable you to specify whether AWS CloudFormation updates instances that are in an Auto Scaling group in batches or all at once. 通过滚动更新,您可以指定AWS CloudFormation是批量更新Auto Scaling组中的实例还是一次更新所有实例。

So to test this, create a stack based on your template. 因此,要对此进行测试,请根据模板创建堆栈。 than make a small modification to the launch config (eg set sleep 120 to 121) and update the stack. 而不是对启动配置进行小的修改(例如,设置睡眠120到121)并更新堆栈。 now you should see a rolling update. 现在您应该看到滚动更新。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 aws cloudformation 中的 cfn-signal 与 launchConfiguration 和 AutoScalingGroup - cfn-signal in aws cloudformation with launchConfiguration and AutoScalingGroup CloudFormation:有条件的 AutoScalingGroup 通知 - CloudFormation: conditional AutoScalingGroup notifications 如何使用 CloudFormation 更新 AWS::AutoScaling::AutoScalingGroup? - How do I update an AWS::AutoScaling::AutoScalingGroup using CloudFormation? CloudFormation模板AutoScalingGroup可用区 - CloudFormation template AutoScalingGroup availability zones AWS Cloudformation无法确认AutoScalingGroup - AWS Cloudformation failing to acknowledge AutoScalingGroup 代码部署过程正在进行时,EC2 Auto Scaling扩展事件 - EC2 Auto Scaling scale-up event while Code deploy process going on CloudFormation AutoscalingGroup“ LoadBalancer附件不稳定” - CloudFormation AutoscalingGroup “LoadBalancer attachments did not stabilize” AutoScalingGroup中的UpdatePolicy无法更新实例 - UpdatePolicy in AutoScalingGroup failes to update instances CloudFormation 正在等待清理与 Lambda Function 关联的 NetworkInterfaces - CloudFormation is waiting for NetworkInterfaces associated with the Lambda Function to be cleaned up Cloudformation yaml - 在 AutoScalingGroup 资源中生成实例列表 - Cloudformation yaml- generate a list of instances in AutoScalingGroup resource
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM