简体   繁体   中英

Capacity scheduler in Amazon Elastic MapReduce

I am totally new to Amazon Elastic MapReduce. I have a need that I want to use my custom scheduler, which is implemented based on Hadoop capacity scheduler, to schedule my jobs in Amazon Elastic MapReduce.

According to my current understanding, to achieve this, I can define only one stage in the job flow, and submit my custom jar file via SSH connection to the master node. However, I cannot find how can I edit the xml configuration files, like capacity-scheduler.xml in the master node. Anyone knows how to do that?

Moreover, if I want to add the dynamic sizing property onto it, can I dynamically tune the number of task nodes in the cluster, when the job is currently running? Or in per stage, the size of a cluster should remain the same? Thank you so much.

You should use a bootstrap action to change Hadoop configuration.

The following AWS doc can be referenced for Hadoop configuratio bootstrap action.
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-bootstrap.html#PredefinedbootstrapActions_ConfigureHadoop

This blog article that I bookmarked also has some info. http://sujee.net/tech/articles/hadoop/amazon-emr-beyond-basics/

For changing the cluster size dynamically, one option is to use the AWS SDK.
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/calling-emr-with-java-sdk.html

Using the following interface you can modify the instance count of the instance group. http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/elasticmapreduce/AmazonElasticMapReduce.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM