简体   繁体   中英

Running command on EC2 launch and shutdown in auto-scaling group

I'm running a Docker swarm deployed on AWS. The setup is an auto-scaling group of EC2 instances that each act as Docker swarm nodes.

When the auto-scaling group scales out (spawns new instance) I'd like to run a command on the instance to join the Docker swarm (ie docker swarm join ... ) and when it scales in (shuts down instances) to leave the swarm ( docker swarm leave ).

I know I can do the first one with user data in the launch configuration, but I'm not sure how to act on shutdown. I'd like to make use of lifecycle hooks , and the docs mention I can run custom actions on launch/terminate, but it is never explained just how to do this. It should be possible to do without sending SQS/SNS/Cloudwatch events, right?

My AMI is a custom one based off of Ubuntu 16.04.

Thanks.

One of the core issues is that removing a node from a Swarm is currently a 2 or 3-step action when done gracefully, and some of those actions can't be done on the node that's leaving:

  1. docker node demote , if leaving-node is a manager
  2. docker swarm leave on leaving-node
  3. docker swarm rm on a manager

This step 3 is what's tricky because it requires you to do one of three things to complete the removal process:

  1. Put something on a worker that would let it do things on a manager remotely (ssh to a manager with sudo perms, or docker manager API access). Not a good idea. This breaks the security model of "workers can't do manager things" and greatly increases risk, so not recommended. We want our managers to stay secure, and our workers to have no control or visibility into the swarm.

  2. (best if possible) Setup an external solution so that on a EC2 node removal, a job is run to SSH or API into a manager and remove the node from swarm. I've seen people do this, but can't remember a link/repo for full details on using a lambda, etc. to deal with the lifecycle hook.

  3. Setup a simple cron on a single manager (or preferably as a manager-only service running a cron container) that removes workers that are marked down . This is a sort of blunt approach and has edge cases where you could potentially delete a node that's existing but considered down/unhealthy by swarm, but I've not heard of that happening. If it was fancy, it could maybe validate with AWS that node is indeed gone before removing.

WORST CASE , if a node goes down hard and doesn't do any of the above, it's not horrible, just not ideal for graceful management of user/db connections. After 30s a node is considered down and Service tasks will be re-created on healthy nodes. A long list of workers marked down in the swarm node list doesn't have an effect on your Services really, it's just unsightly (as long as there are enough healthy workers).

THERE'S A FEATURE REQUEST in GitHub to make this removal easier. I've commented on what I'm seeing in the wild. Feel free to post your story and use case in the SwarmKit repo .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM