简体   繁体   English

使用Apache Oozie编排Apache Spark

[英]Orchestration of Apache Spark using Apache Oozie

We are thinking of the integration of apache spark in our calculation process where we at first wanted to use apache oozie and standard MR or MO (Map-Only) jobs. 我们正在考虑在我们的计算过程中集成apache spark,我们最初想要使用apache oozie和标准的MR或MO(Map-Only)作业。

After some research several questions remain: 经过一些研究后仍有几个问题

  1. Is it possible to orchestrate an apache spark process by using apache oozie? 是否可以通过使用apache oozie来协调apache spark进程? If yes, how? 如果有,怎么样?
  2. Is oozie necessary anymore or could spark handle orchestration by itself? 是否需要oozie或者可以自行引发处理编排? ( unification seems to be one of the main concerns in spark ) 统一似乎是火花中的主要问题之一

Please consider the following scenarios when answering: 在回答时请考虑以下情况

  1. executing a work flow every 4 hours 每4小时执行一次工作流程
  2. executing a work flow whenever specific data is accessible 只要可以访问特定数据,就执行工作流程
  3. trigger a work flow and configure it with parameters 触发工作流程并使用参数进行配置

Thanks for your answers in advance. 感谢您提前的答案。

Spark is supported in Oozie 4.2 as an action type, see docs . Oozie 4.2支持Spark作为操作类型,请参阅docs The scenarios you mentioned are common Oozie features. 您提到的场景是常见的Oozie功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM