简体繁体 English

为什么一头猪/蜂巢有多个MapReduce工作？

[英]Why multiple MapReduce jobs for one pig / Hive job?

原文 2015-11-23 11:18:04 5 1 hadoop/ hive/ apache-pig

I am using Pig to run my hadoop job. 我正在使用Pig来执行我的hadoop工作。 When I run the pig script and then navigate to the YARN resource manager UI, I could see multiple MapReduce jobs getting created for the same Pig job? 当我运行Pig脚本，然后导航到YARN资源管理器UI时，我可以看到为同一个Pig作业创建了多个MapReduce作业吗？ I believe it would be the same for Hive jobs as well. 我相信对于Hive的工作也是如此。

Can anyone please let me know the reasoning behind this? 有人可以让我知道其背后的原因吗？ On what basis would one pig job be split into multiple mapreduce jobs? 在什么基础上将一个养猪工作分成多个mapreduce工作？ One among them happens to be TempletonControllerJob. 其中之一恰好是TempletonControllerJob。

Thanks 谢谢

1 个解决方案

Templeton Controller Job is like a Parent job which will call another child map-reduce job. Templeton Controller Job就像是Parent作业，它将调用另一个子map-reduce作业。 It is basically to control the execution. 基本上是控制执行。

Before executing, Pig basically comes up with a execution plan - where it scans all the steps in the pig script and combines steps which can be executed in a single job. 在执行之前，Pig基本上提出了一个执行计划-在其中扫描Pig脚本中的所有步骤，并合并可以在单个作业中执行的步骤。 When there are two steps in the pig script which cannot be calculated in a single job, it splits it into two. 当Pig脚本中有两个步骤无法在单个作业中计算时，它将分为两个步骤。 Once it has done this combining and calculates the number of jobs and steps in each job to come up with the final result, it starts the execution. 完成合并并计算出每个作业的作业数量和步骤以得出最终结果后，它将开始执行。