简体   繁体   English

当输入文件小于地图节点时,hadoop调度程序如何工作

[英]How hadoop scheduler work when input file smaller than map node

In my situation I need to know if I have 2 job in Jobcontrol and then I have 200 map node my block size is 64 MB so full processing data should be (64*200 = 12.8GB) but firsr job size is 10 GB how hadoop do if map node has empty slots ? 在我的情况下,我需要知道Jobcontrol中是否有2个作业,然后有200个地图节点,我的块大小为64 MB,因此完整的处理数据应该为(64 * 200 = 12.8GB),但是firsr作业大小为10 GB如果地图节点有空插槽怎么办? Hadoop will be process the second job in queue or waiting until first job map reduce finish and process second job or another way please suggest Hadoop将在队列中处理第二个作业,或者等到第一个作业图减少完成并处理第二个作业,否则建议

If you're using Hadoop's FIFO scheduler and running 2 jobs as same user then, the second job will start running only when there are enough free Map/Reduce slots available while running the first job. 如果您使用Hadoop的FIFO调度程序并以同一用户身份运行2个作业,则只有在运行第一个作业时有足够的可用Map / Reduce插槽可用时,第二个作业才会开始运行。 Even if there are any, preference is given by scheduler to the first job over second. 即使有,调度程序也会优先选择第一个作业而不是第二个作业。 So, Hadoop will wait for the first job to finish enough before the second job is scheduled. 因此,Hadoop将在计划第二个作业之前等待第一个作业完成足够的时间。

Usually it is not suggested to have files smaller than the input split size, because the namenode has to manage lot more file inodes when compared to a single large file. 通常不建议文件小于输入拆分大小,因为与单个大文件相比,namenode必须管理更多的文件inode。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM