简体繁体 English

如何自动缩放EMR任务实例

[英]How to autoscale EMR task instances

原文 2016-07-30 10:36:25 3 4 amazon-web-services/ cloud/ amazon-sqs/ amazon-emr/ amazon-cloudwatch

I am using EMR with task instance groups as spot instances. 我将EMR与任务实例组一起用作竞价型实例。 I want to maintain minimum number of task instances always. 我想始终保持最小数量的任务实例。 Means, whenever EMR terminates task instances because of bid price goes higher than what we set, my application should launch another task instance with little higher bid price. 就是说，每当EMR由于出价高于我们设置的价格而终止任务实例时，我的应用程序都应以较低的出价启动另一个任务实例。

My research- 我的研究

Use Cloudwatch to inform when it breaches threshhold, and auto-scale task instances. 使用Cloudwatch通知何时违反阈值，并自动缩放任务实例。 But as per study, there is no concept of auto-scaling in EMR. 但是根据研究，EMR中没有自动缩放的概念。
Use Cloudwatch, and notify SQS when threshhold breahes, and there is one service who is always consuming and expand task instances. 使用Cloudwatch，并在阈值中断时通知SQS，并且只有一项服务始终在使用和扩展任务实例。

Questions 问题

Is there any auto-scaling present in EMR ? EMR中是否存在任何自动缩放功能？ If that is available, then my efforts will reduce to just set threshhold, and corresponding expansion task instances action. 如果可以的话，那么我的工作将减少为仅设置阈值，并执行相应的扩展任务实例操作。
If you have any other approach to solve this problem, please suggest. 如果您有其他方法可以解决此问题，请提出建议。

4 个解决方案

How Spot Prices Work 现货价格如何运作

When an Amazon EC2 instance is launched with a spot price (including when launched from Amazon EMR), the instance will start if the current spot price is below the provided bid price . 当以现货价格启动Amazon EC2实例时（包括从Amazon EMR启动时），如果当前现货价格低于提供的买入价 ，则实例将启动。 If the spot price rises above the bid price, the instance is terminated. 如果现货价格上涨到高于买入价，则实例终止。 Instances are only charged the current spot price . 实例仅按当前现货价收费 。

Therefore, the logic of launching a new spot instance with a "little higher bid price" is not necessary. 因此，不需要以“稍高的出价”启动新的现货实例的逻辑。 The instance will always be charged the current spot price , so simply bid as high as you are willing to pay for a spot instance. 该实例将始终按照当前的现货价格收费 ，因此只需出价与您愿意为该现货实例支付的价格一样高即可。 You will either pay less than the spot price (great!) or your instance will be terminated because the price has gone higher than you are willing to pay (in which case you don't want to pay a "little higher" for the instance). 您将支付的价格低于现货价格（最高！），或者您的实例将被终止，因为价格已经超出了您愿意支付的价格（在这种情况下，您不想为该实例支付“稍高的价格”））。

If you wish to "maintain minimum number of task instances" at all times, then either pay the normal EMR charge (which means the instances won't be terminated) or bid a particularly large price for the spot instances, such as 2 x the normal price . 如果您希望一直“保持最少的任务实例数量”，则要么支付正常的EMR费用（这意味着实例不会被终止），要么为现货实例出价特别高的价格，例如2倍正常价格 。 Yes, you might occasionally pay more for instances, but on average your price will be quite low. 是的，您有时可能会为实例支付更高的价格，但是平均而言，您的价格会很低。

If you wish to be particularly sneaky, you could bid up to the normal price for the EC2 instances then, if instances are terminated, launch more task nodes without using spot pricing. 如果您想偷偷摸摸，可以出价EC2实例的正常价格，然后，如果实例终止，则启动更多任务节点而不使用现货定价。 That way, your instances won't be terminated and you won't pay more than the normal EC2 price. 这样，您的实例将不会被终止，您所支付的费用也不会超过正常的EC2价格。 However, you would have to terminate and replace those instances when the spot price drops , otherwise you are paying too much. 但是， 当现货价格下降时 ， 您将不得不终止并替换这些实例 ，否则您将付出太多。 That's why it might be better just to provide a high bid price on your spot instances. 这就是为什么最好在您的现货实例上提供高出价的原因。

Bottom line: Use spot pricing, but bid a high price. 底线：使用现货定价，但要价高。 You'll get a good price most of the time. 大多数时候您会得到一个不错的价格。

AWS EMR does not have a autoscaling option available. AWS EMR没有可用的自动缩放选项。 But you can use a work around and integrate Autoscaling using AWS SQS. 但是您可以使用AWS SQS进行变通并集成Autoscaling。 This is a rough picture what you can integrate. 这是您可以集成的粗略图片。

Launch you EMR cluster using spot instance. 使用竞价型实例启动EMR集群。
Set up a SQS Queue and create 3 triggers one for CPU threshold , second for EC2 spot instance termination notice and third for changing the spot instance bid prices. 设置一个SQS队列并创建3个触发器，一个用于CPU阈值，第二个用于EC2竞价型实例终止通知，第三个用于更改竞价型实例报价。
So if the CPU usage increases SQS will trigger an event to launch a new instance to cluster, if there is spot instance termination notice SQS will trigger to launch another instance to balance the load and send a event to change the bid price to launch another spot instance. 因此，如果CPU使用率增加，SQS将触发一个事件以启动新实例到集群，如果有现货实例终止通知，则SQS将触发以启动另一个实例以平衡负载并发送事件以更改出价以启动另一个现货。实例。 (This is just rough sketch but I guess you will understand the logic. （这只是一个粗略的草图，但我想您会理解逻辑的。

This is guide to AWS SQS Autoscaling. 这是AWS SQS自动缩放的指南。

https://docs.aws.amazon.com/autoscaling/latest/userguide/as-using-sqs-queue.html https://docs.aws.amazon.com/autoscaling/latest/userguide/as-using-sqs-queue.html

As has been correctly pointed, the EMR API provides all necessary ingredients to 1) collect monitoring data, and 2) programmatically scale the cluster up and down. 正如已经正确指出的那样，EMR API提供了所有必要的要素，以1）收集监视数据，以及2）以编程方式向上和向下扩展群集。

Basically, there are two main options to implement autoscaling for EMR clusters: 基本上，有两个主要选项可为EMR群集实现自动扩展：

Autoscaling Loop: A process that is running on a server and continuously monitors the cluster for its current load. Autoscaling Loop（自动缩放循环）：在服务器上运行并持续监视群集当前负载的进程。 Performance metrics (memory, CPU, I/O, etc) can be collected in regular intervals and stored in a database. 可以定期收集性能指标（内存，CPU，I / O等）并将其存储在数据库中。 Autoscaling rules are evaluated against the performance metrics, and the cluster's task nodes are scaled up or down if required. 将根据性能指标评估自动扩展规则，并根据需要扩展或缩减集群的任务节点。
Event-Based Autoscaling: Using CloudWatch metrics (eg, metrics for EMR or EC2 ), you can programmatically define triggers that are fired under certain conditions (for instance, add nodes if average CPUUtilization of all nodes exceeds 80%). 基于事件的自动缩放：使用CloudWatch指标（例如EMR或EC2的指标），您可以以编程方式定义在某些条件下触发的触发器（例如，如果所有节点的平均CPU利用率超过80％，则添加节点）。

Both options have their pros and cons. 两种选择都有其优点和缺点。 The main advantage of option 2 is that it is a server-less approach (does not require to run your own server). 选项2的主要优点是它是一种无需服务器的方法（不需要运行您自己的服务器）。 Option 1, on the other hand, does require a server, but therefore comes with more control to customize the logic of your scaling rules. 另一方面，选项1确实需要服务器，但因此具有更多的控件来自定义扩展规则的逻辑。 Also, it allows to keep searchable records of the history of the scaling decisions. 而且，它允许保留可伸缩决策历史记录的可搜索记录。

You could take a look at Themis , an EMR autoscaling framework developed at Atlassian. 您可以看一下Atmissian开发的EMR自动缩放框架Themis 。 Themis implements the autoscaling loop as discussed in option 1 above. Themis实现了上面的选项1中讨论的自动缩放循环。 Current features include proactive as well as reactive autoscaling, support for spot/on-demand task nodes, it comes with a Web UI, and the tool is very easy to configure. 当前的功能包括主动和被动自动缩放，对点/按需任务节点的支持，它带有Web UI，并且该工具易于配置。

I have had a similar problem, and I wanted to share one possible alternative. 我有一个类似的问题，我想分享一个可能的选择。 I have written a Java tool to dynamically resize an EMR cluster during the processing. 我编写了一个Java工具来在处理过程中动态调整EMR集群的大小。 It might help you. 它可能会帮助您。 Check it out at: 在以下位置查看：

http://www.lopakalogic.com/articles/hadoop-articles/dynamically-resize-emr/ http://www.lopakalogic.com/articles/hadoop-articles/dynamically-resize-emr/

The source code is available on Github 源代码在Github上可用