简体繁体 English

管理多个AWS帐户

[英]Manage multiple aws accounts

原文 2018-02-03 16:00:21 0 2 amazon-web-services/ automation/ monitoring/ failover/ self-healing

I would like to know a system by which I can keep track of multiple aws accounts, somewhere around 130+ accounts with each account containing around 200+ servers. 我想了解一个系统，通过该系统可以跟踪多个AWS帐户，大约130多个帐户，每个帐户包含200多个服务器。
I wanna know methods to keep track of machine failure, service failure etc. 我想知道跟踪机器故障，服务故障等的方法。
I also wanna know methods by which I can automatically turn up a machine if the underlying hardware failed or the machine terminated while on spot. 我还想知道如果基础硬件发生故障或当场终止了计算机的运行方式，则可以自动启动计算机的方法。
I'm open to all solutions including chef/terraform automation, healing scripts etc. 我愿意接受所有解决方案，包括厨师/地形自动化，修复脚本等。

You guys will be saving me a lot of sleepless nights :) 你们将为我节省很多不眠之夜:)

Thanks in advance!! 提前致谢！！

2 个解决方案

This is purely my take on implementing your problem statement. 这纯粹是我对执行您的问题陈述的看法。

1) Well.. for managing and keeping track of multiple aws accounts you can use AWS Organization . 1）好..要管理和跟踪多个AWS账户，可以使用AWS Organization 。 This will help you manage centrally with one root account all the other 130+ accounts . 这将帮助您使用一个根帐户集中管理所有其他130多个帐户 。 You can enable consolidated billing as well. 您也可以启用合并账单。

2) As far as keeping track of failures... you may need to customize this according to your requirements. 2）至于跟踪故障...，您可能需要根据自己的要求进行自定义。 For example: You can build a micro service on top of docker containers or ecs whose sole purpose is to keep track of failures, generate a report and push to s3 on a daily basis.You can further create a dashboard using AWS quicksight out of this reports in S3. 例如：您可以在docker containers or ecs之上构建微服务，其唯一目的是跟踪故障，生成报告并每天推送到s3 。您还可以使用AWS quicksight进一步创建仪表板S3中的报告。

There can be another micro service which will rectify the failures. 可能会有另一个微服务可以纠正故障。 It just depends on how exhaustive and fine grained you want your implementation to be. 这仅取决于您希望实现的详尽程度和细粒度。

3) For spawning instances when spot instances are terminated, it can be achieved through you simple autoscaling configurations. 3）对于竞价型实例终止时的生成实例，可以通过简单的自动缩放配置来实现。 Here are some of the articles you may want to go through which will give you some ideas: 以下是您可能需要阅读的一些文章，这些文章将为您提供一些想法：

Using Spot Instances with On-Demand instances 将竞价型实例与按需实例一起使用

Optimizing Spot Fleet+Docker with High Availability 通过高可用性优化Spot Fleet + Docker

AWS Organisations are useful for management. AWS组织对于管理很有用。 You can also look at multiple account billing strategy and security strategy . 您还可以查看多个帐户计费策略和安全策略。 A shared services account with your IAM users will make things easier. 与您的IAM用户共享服务帐户将使事情变得简单。

Regarding tracking failures you can set up automatic instance recovery using CloudWatch. 关于跟踪失败，您可以使用CloudWatch设置自动实例恢复。 CloudWatch can also have alerts defined that will email you when something happens you don't expect, though setting them up individually could be time consuming. CloudWatch还可以定义警报，这些警报将在您意外发生的情况下通过电子邮件发送给您，尽管单独设置它们可能会很耗时。 At your scale I think you should look into third party tools. 以您的规模，我认为您应该研究第三方工具。