[英]Oozie for multiple mapreduce jobs
I have a sequence of mapreduce jobs that need to be run. 我有一系列需要运行的mapreduce作业。 I was wondering if there is any advantage of using Oozie for that, instead of having "one big driver" that will run that sequence? 我想知道使用Oozie而不是拥有将运行该序列的“一个大驱动程序”是否有任何优势?
I know that Oozie can be used to run multiple actions of different type, eg pig script, shell script, mr job, but I'm concretely interested should I split my two jobs and run them using Oozie, or have a single jar to do that? 我知道Oozie可用于运行不同类型的多个动作,例如Pig脚本,Shell脚本,Mr job,但是我特别感兴趣的是我应该拆分两个作业并使用Oozie运行它们,还是只用一个jar来做那?
Oozie is a scheduler - crude, poorly documented, but a scheduler. Oozie是一个调度程序-原始的,文献记录很少,但是是一个调度程序。
... well, don't use a scheduler. ...好吧,不要使用调度程序。
PS: you also have Luigi (Spotify) and Azkaban (LinkedIn) as alternative Hadoop schedulers. PS:您还有Luigi (Spotify)和Azkaban (LinkedIn)作为替代Hadoop调度程序。
[edit] extra point to consider: if your "driver" crashes for whatever reason, you may not have a chance to send an alert; [编辑]要考虑的其他要点:如果您的“驱动程序”由于任何原因崩溃,则您可能没有机会发送警报; but if run from Oozie, the crash will be detected eventually (may take as much as 30 min. in a corner case eg AM job self-destruction due to YARN RM failover) 但是如果从Oozie运行,则最终将检测到崩溃(在极端情况下,可能需要多达30分钟的时间,例如,由于YARN RM故障转移而导致AM作业自毁)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.