简体   繁体   English

Oozie用于多个mapreduce工作

[英]Oozie for multiple mapreduce jobs

I have a sequence of mapreduce jobs that need to be run. 我有一系列需要运行的mapreduce作业。 I was wondering if there is any advantage of using Oozie for that, instead of having "one big driver" that will run that sequence? 我想知道使用Oozie而不是拥有将运行该序列的“一个大驱动程序”是否有任何优势?

I know that Oozie can be used to run multiple actions of different type, eg pig script, shell script, mr job, but I'm concretely interested should I split my two jobs and run them using Oozie, or have a single jar to do that? 我知道Oozie可用于运行不同类型的多个动作,例如Pig脚本,Shell脚本,Mr job,但是我特别感兴趣的是我应该拆分两个作业并使用Oozie运行它们,还是只用一个jar来做那?

Oozie is a scheduler - crude, poorly documented, but a scheduler. Oozie是一个调度程序-原始的,文献记录很少,但是是一个调度程序。

  • If you don't need scheduling per se , or if CRON on an edge node is sufficient 如果您本身不需要调度,或者边缘节点上的CRON已足够
  • if you want to handle your workflow logic by yourself (eg conditional branching, parallel executions w/ waiting for stragglers, calling generic sub-workflows w/ ad hoc parameters, e-mail alerts on errors, <insert your pet feature here>) or don't need any fancy logic 如果您想自己处理工作流逻辑(例如,条件分支,等待散列者的并行执行,使用临时参数调用通用子工作流,有关错误的电子邮件警报,<在此处插入您的宠物功能>)或不需要任何花哨的逻辑
  • if you handle your executions logs and state history by yourself, or don't care about history 如果您自己处理执行日志和状态历史记录,或者不在乎历史记录

... well, don't use a scheduler. ...好吧,不要使用调度程序。

PS: you also have Luigi (Spotify) and Azkaban (LinkedIn) as alternative Hadoop schedulers. PS:您还有Luigi (Spotify)和Azkaban (LinkedIn)作为替代Hadoop调度程序。

[edit] extra point to consider: if your "driver" crashes for whatever reason, you may not have a chance to send an alert; [编辑]要考虑的其他要点:如果您的“驱动程序”由于任何原因崩溃,则您可能没有机会发送警报; but if run from Oozie, the crash will be detected eventually (may take as much as 30 min. in a corner case eg AM job self-destruction due to YARN RM failover) 但是如果从Oozie运行,则最终将检测到崩溃(在极端情况下,可能需要多达30分钟的时间,例如,由于YARN RM故障转移而导致AM作业自毁)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM