简体   繁体   中英

Why multiple MapReduce jobs for one pig / Hive job?

I am using Pig to run my hadoop job. When I run the pig script and then navigate to the YARN resource manager UI, I could see multiple MapReduce jobs getting created for the same Pig job? I believe it would be the same for Hive jobs as well.

Can anyone please let me know the reasoning behind this? On what basis would one pig job be split into multiple mapreduce jobs? One among them happens to be TempletonControllerJob.

YARN资源管理器UI

Thanks

Templeton Controller Job is like a Parent job which will call another child map-reduce job. It is basically to control the execution.

Before executing, Pig basically comes up with a execution plan - where it scans all the steps in the pig script and combines steps which can be executed in a single job. When there are two steps in the pig script which cannot be calculated in a single job, it splits it into two. Once it has done this combining and calculates the number of jobs and steps in each job to come up with the final result, it starts the execution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM