简体   繁体   English

如何确定要在-w参数中设置的giraph的工作者人数?

[英]how determine the number of workers of giraph to set in -w argument?

I'm using an ec2 hadoop cluster that is comprised of 20 c3.8xlarge machines, each having 60 GB RAM and 32 virtual CPUs. 我正在使用由20个c3.8xlarge机器组成的ec2 hadoop集群,每个机器具有60 GB RAM和32个虚拟CPU。 In every machine I set up yarn and mapreduce settings as documented here https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-task-config.html , ie as showed below: 我在每台机器上都设置了yarn和mapreduce设置,如下所示: https: //docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-task-config.html,如下所示:

c3.8xlarge
Configuration Option    Default Value
mapreduce.map.java.opts -Xmx1331m
mapreduce.reduce.java.opts  -Xmx2662m
mapreduce.map.memory.mb 1664
mapreduce.reduce.memory.mb  3328
yarn.app.mapreduce.am.resource.mb   3328
yarn.scheduler.minimum-allocation-mb    32
yarn.scheduler.maximum-allocation-mb    53248
yarn.nodemanager.resource.memory-mb 53248

Now what criteria I have to use in order to determine the most appropriate number of workers to use with giraph? 现在,我必须使用什么标准来确定最适合使用giraph的工人数量? Ie what number I have to use for -w argument? 即-w参数必须使用什么数字? Is that criteria related to above settings? 该标准与上述设置有关吗?

There's is no optimal number, but the most parallel workers you can have roughly can be calculated like so. 没有最佳数目,但是可以大致计算出最多并行工作的对象。

Every NodeManager has 53248 MB, multiply that by your slave node count 每个NodeManager都有53248 MB,再乘以您的从节点数

Subtract only one am.resource.mb amount from that, since all the jobs need a application master. am.resource.mb减去一个am.resource.mb数量,因为所有作业都需要一个应用程序主机。

Then divide that by the larger of one of your mapper or reducer memory for the total number of MapReduce tasks that can run at once 然后将其除以您的映射器或化简器内存中较大的一个,得出可以一次运行的MapReduce任务的总数

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM