简体   繁体   English

自动缩放组中EC2启动和关闭时的运行命令

[英]Running command on EC2 launch and shutdown in auto-scaling group

I'm running a Docker swarm deployed on AWS. 我正在运行部署在AWS上的Docker群。 The setup is an auto-scaling group of EC2 instances that each act as Docker swarm nodes. 该设置是一组EC2实例的自动扩展组,每个实例均充当Docker群集节点。

When the auto-scaling group scales out (spawns new instance) I'd like to run a command on the instance to join the Docker swarm (ie docker swarm join ... ) and when it scales in (shuts down instances) to leave the swarm ( docker swarm leave ). 当自动扩展组向外扩展(生成新实例)时,我想在该实例上运行命令以加入Docker swarm(即docker swarm join ... ),并在其扩展时(关闭实例)离开群( docker swarm leave )。

I know I can do the first one with user data in the launch configuration, but I'm not sure how to act on shutdown. 我知道我可以在启动配置中对用户数据进行第一个操作,但是我不确定如何执行关机操作。 I'd like to make use of lifecycle hooks , and the docs mention I can run custom actions on launch/terminate, but it is never explained just how to do this. 我想使用生命周期钩子 ,并且文档提到我可以在启动/终止时运行自定义操作,但是从来没有仅说明如何执行此操作。 It should be possible to do without sending SQS/SNS/Cloudwatch events, right? 不发送SQS / SNS / Cloudwatch事件就应该可以做到,对吗?

My AMI is a custom one based off of Ubuntu 16.04. 我的AMI是基于Ubuntu 16.04定制的。

Thanks. 谢谢。

One of the core issues is that removing a node from a Swarm is currently a 2 or 3-step action when done gracefully, and some of those actions can't be done on the node that's leaving: 核心问题之一是,从Swarm中删除一个节点当前是正常执行的2步或3步操作,其中某些操作无法在要离开的节点上执行:

  1. docker node demote , if leaving-node is a manager docker node demote ,如果离开节点是管理员
  2. docker swarm leave on leaving-node docker swarm leave在离开节点上离开
  3. docker swarm rm on a manager docker swarm rm在管理器上

This step 3 is what's tricky because it requires you to do one of three things to complete the removal process: 第3步很棘手,因为它需要您完成以下三件事之一来完成删除过程:

  1. Put something on a worker that would let it do things on a manager remotely (ssh to a manager with sudo perms, or docker manager API access). 在工作人员身上放置一些东西,使其可以在远程管理器上执行操作(通过sudo perms或docker manager API访问权限向管理器SSH)。 Not a good idea. 这不是一个好主意。 This breaks the security model of "workers can't do manager things" and greatly increases risk, so not recommended. 这破坏了“工人不能做经理的事情”的安全模型,并极大地增加了风险,因此不建议这样做。 We want our managers to stay secure, and our workers to have no control or visibility into the swarm. 我们希望我们的经理保持安全,而我们的工人则无法控制或查看集群。

  2. (best if possible) Setup an external solution so that on a EC2 node removal, a job is run to SSH or API into a manager and remove the node from swarm. (如果可能的话,最好)设置一个外部解决方案,以便在删除EC2节点时,将作业运行到SSH或API到管理器中,然后从群集中删除该节点。 I've seen people do this, but can't remember a link/repo for full details on using a lambda, etc. to deal with the lifecycle hook. 我见过人们这样做,但不记得有关使用lambda等处理生命周期挂钩的完整详细信息的链接/存储库。

  3. Setup a simple cron on a single manager (or preferably as a manager-only service running a cron container) that removes workers that are marked down . 设置在单个管理器的简单的cron(或优选作为运行一个cron容器仅管理器服务),其去除被标记工人 down This is a sort of blunt approach and has edge cases where you could potentially delete a node that's existing but considered down/unhealthy by swarm, but I've not heard of that happening. 这是一种钝器,在某些极端情况下,您可能会删除一个已存在但被群集认为处于关闭状态/不正常运行的节点,但是我没有听说过这种情况。 If it was fancy, it could maybe validate with AWS that node is indeed gone before removing. 如果花哨的话,也许可以通过AWS验证该节点确实已删除,然后再删除。

WORST CASE , if a node goes down hard and doesn't do any of the above, it's not horrible, just not ideal for graceful management of user/db connections. 最坏的情况是 ,如果节点严重故障而没有执行上述任何操作,那么这并不可怕,对于理想地管理用户/数据库连接而言,这并不是理想的选择。 After 30s a node is considered down and Service tasks will be re-created on healthy nodes. 30秒后,将认为某个节点已关闭,并且将在运行正常的节点上重新创建服务任务。 A long list of workers marked down in the swarm node list doesn't have an effect on your Services really, it's just unsightly (as long as there are enough healthy workers). 工人一长串的标记down在群节点列表中没有你的服务的效果真的,它只是难看(只要有足够健康的工人)。

THERE'S A FEATURE REQUEST in GitHub to make this removal easier. GitHub中有一项功能要求 ,以使删除操作更容易。 I've commented on what I'm seeing in the wild. 我对自己在野外看到的东西发表了评论。 Feel free to post your story and use case in the SwarmKit repo . 随时在SwarmKit存储库中发布您的故事和用例

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM