简体   繁体   中英

Orchestration of Apache Spark using Apache Oozie

We are thinking of the integration of apache spark in our calculation process where we at first wanted to use apache oozie and standard MR or MO (Map-Only) jobs.

After some research several questions remain:

  1. Is it possible to orchestrate an apache spark process by using apache oozie? If yes, how?
  2. Is oozie necessary anymore or could spark handle orchestration by itself? ( unification seems to be one of the main concerns in spark )

Please consider the following scenarios when answering:

  1. executing a work flow every 4 hours
  2. executing a work flow whenever specific data is accessible
  3. trigger a work flow and configure it with parameters

Thanks for your answers in advance.

Spark is supported in Oozie 4.2 as an action type, see docs . The scenarios you mentioned are common Oozie features.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM