简体   繁体   中英

How to limit a Hadoop MapReduce job to a certain number of nodes?

So, basically I have a system with 4 data nodes. However, to check the scalability of my hadoop application, I want to test it with 1, 2 and 4 nodes. So, how can I limit the number of nodes used by hadoop to only 1 or 2. I am using hadoop 2.5.1 and I don't have admin rights to the system. Moreover, how can I also control the number of cores used by hadoop for a node?

You need admin rights to do all that

how can I limit the number of nodes used by hadoop to only 1 or 2.

Decommission 2-3 nodes

how can I also control the number of cores used by hadoop for a node

set below config in yarn-site.xml to allocate 8 vcores per node

<property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>8</value>
</property>

also update yarn.scheduler.capacity.resource-calculator in capacity-scheduler.xml because DefaultResourceCalculator only uses memory.

  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
    <description>
      The ResourceCalculator implementation to be used to compare
      Resources in the scheduler.
      The default i.e. DefaultResourceCalculator only uses Memory while
      DominantResourceCalculator uses dominant-resource to compare
      multi-dimensional resources such as Memory, CPU etc.
    </description>   </property>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM