[英]How to autoscale EMR task instances
I am using EMR with task instance groups as spot instances. 我将EMR与任务实例组一起用作竞价型实例。 I want to maintain minimum number of task instances always.
我想始终保持最小数量的任务实例。 Means, whenever EMR terminates task instances because of bid price goes higher than what we set, my application should launch another task instance with little higher bid price.
就是说,每当EMR由于出价高于我们设置的价格而终止任务实例时,我的应用程序都应以较低的出价启动另一个任务实例。
My research- 我的研究
Questions 问题
How Spot Prices Work 现货价格如何运作
When an Amazon EC2 instance is launched with a spot price (including when launched from Amazon EMR), the instance will start if the current spot price is below the provided bid price . 当以现货价格启动Amazon EC2实例时(包括从Amazon EMR启动时),如果当前现货价格低于提供的买入价 ,则实例将启动。 If the spot price rises above the bid price, the instance is terminated.
如果现货价格上涨到高于买入价,则实例终止。 Instances are only charged the current spot price .
实例仅按当前现货价收费 。
Therefore, the logic of launching a new spot instance with a "little higher bid price" is not necessary. 因此,不需要以“稍高的出价”启动新的现货实例的逻辑。 The instance will always be charged the current spot price , so simply bid as high as you are willing to pay for a spot instance.
该实例将始终按照当前的现货价格收费 ,因此只需出价与您愿意为该现货实例支付的价格一样高即可。 You will either pay less than the spot price (great!) or your instance will be terminated because the price has gone higher than you are willing to pay (in which case you don't want to pay a "little higher" for the instance).
您将支付的价格低于现货价格(最高!),或者您的实例将被终止,因为价格已经超出了您愿意支付的价格(在这种情况下,您不想为该实例支付“稍高的价格”) )。
If you wish to "maintain minimum number of task instances" at all times, then either pay the normal EMR charge (which means the instances won't be terminated) or bid a particularly large price for the spot instances, such as 2 x the normal price . 如果您希望一直“保持最少的任务实例数量”,则要么支付正常的EMR费用(这意味着实例不会被终止),要么为现货实例出价特别高的价格,例如2倍正常价格 。 Yes, you might occasionally pay more for instances, but on average your price will be quite low.
是的,您有时可能会为实例支付更高的价格,但是平均而言,您的价格会很低。
If you wish to be particularly sneaky, you could bid up to the normal price for the EC2 instances then, if instances are terminated, launch more task nodes without using spot pricing. 如果您想偷偷摸摸,可以出价EC2实例的正常价格,然后,如果实例终止,则启动更多任务节点而不使用现货定价。 That way, your instances won't be terminated and you won't pay more than the normal EC2 price.
这样,您的实例将不会被终止,您所支付的费用也不会超过正常的EC2价格。 However, you would have to terminate and replace those instances when the spot price drops , otherwise you are paying too much.
但是, 当现货价格下降时 , 您将不得不终止并替换这些实例 ,否则您将付出太多。 That's why it might be better just to provide a high bid price on your spot instances.
这就是为什么最好在您的现货实例上提供高出价的原因。
Bottom line: Use spot pricing, but bid a high price. 底线:使用现货定价,但要价高。 You'll get a good price most of the time.
大多数时候您会得到一个不错的价格。
AWS EMR does not have a autoscaling option available. AWS EMR没有可用的自动缩放选项。 But you can use a work around and integrate Autoscaling using AWS SQS.
但是您可以使用AWS SQS进行变通并集成Autoscaling。 This is a rough picture what you can integrate.
这是您可以集成的粗略图片。
This is guide to AWS SQS Autoscaling. 这是AWS SQS自动缩放的指南。
https://docs.aws.amazon.com/autoscaling/latest/userguide/as-using-sqs-queue.html https://docs.aws.amazon.com/autoscaling/latest/userguide/as-using-sqs-queue.html
As has been correctly pointed, the EMR API provides all necessary ingredients to 1) collect monitoring data, and 2) programmatically scale the cluster up and down. 正如已经正确指出的那样,EMR API提供了所有必要的要素,以1)收集监视数据,以及2)以编程方式向上和向下扩展群集。
Basically, there are two main options to implement autoscaling for EMR clusters: 基本上,有两个主要选项可为EMR群集实现自动扩展:
Both options have their pros and cons. 两种选择都有其优点和缺点。 The main advantage of option 2 is that it is a server-less approach (does not require to run your own server).
选项2的主要优点是它是一种无需服务器的方法(不需要运行您自己的服务器)。 Option 1, on the other hand, does require a server, but therefore comes with more control to customize the logic of your scaling rules.
另一方面,选项1确实需要服务器,但因此具有更多的控件来自定义扩展规则的逻辑。 Also, it allows to keep searchable records of the history of the scaling decisions.
而且,它允许保留可伸缩决策历史记录的可搜索记录。
You could take a look at Themis , an EMR autoscaling framework developed at Atlassian. 您可以看一下Atmissian开发的EMR自动缩放框架Themis 。 Themis implements the autoscaling loop as discussed in option 1 above.
Themis实现了上面的选项1中讨论的自动缩放循环。 Current features include proactive as well as reactive autoscaling, support for spot/on-demand task nodes, it comes with a Web UI, and the tool is very easy to configure.
当前的功能包括主动和被动自动缩放,对点/按需任务节点的支持,它带有Web UI,并且该工具易于配置。
I have had a similar problem, and I wanted to share one possible alternative. 我有一个类似的问题,我想分享一个可能的选择。 I have written a Java tool to dynamically resize an EMR cluster during the processing.
我编写了一个Java工具来在处理过程中动态调整EMR集群的大小。 It might help you.
它可能会帮助您。 Check it out at:
在以下位置查看:
http://www.lopakalogic.com/articles/hadoop-articles/dynamically-resize-emr/ http://www.lopakalogic.com/articles/hadoop-articles/dynamically-resize-emr/
The source code is available on Github 源代码在Github上可用
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.