简体   繁体   中英

AWS EMR cluster - scale up didn't update dfs.replication value from 1 to 2

I provisioned an AWS EMR HBASE cluster with 1 master and 1 core node (m5.xLarge). My cluster doesn't have any 'task' node as I plan to use this cluster only for storage. The hdfs-site.xml file on both boxes had dfs.replication set to 1 which makes sense. I then manually added 5 more core nodes. I was hoping EMR would bump the replication factor from 1 to 2 as per their docs - https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hdfs-config.html

As I understand, EMR will set the replication factor to 2 if I supply 6 cores during bootstrap, but what about in my use case where I manually scaled the cluster up after I was up and running?

Looks like EMR won't do it automatically. After scaling cluster up, I will need to reconfigure the replication factor by manually reconfiguring the instance group - https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps-running-cluster.html

--instanceGroups.json below

 [
  {
  "InstanceGroupId":"<ig-1xxxxxxx9>",
  "Configurations":[
     {
        "Classification":"yarn-site",
        "Properties":{
           "yarn.nodemanager.disk-health-checker.enable":"true",
           "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage":"100.0"
        },
        "Configurations":[]
     }
  ]
 }
]
aws emr modify-instance-groups --cluster-id <j-2AL4XXXXXX5T9> 
--instance-groups file://instanceGroups.json

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM