简体   繁体   中英

Which Scheduler to use basically to submit spark job in Google Cloud Dataproc Cluster?

I have few spark job that I have to schedule twice or thrice in a day depends on the process requirement and it's kind of batch job. We had that setup in OnPremise Hadoop system and using Apache Oozie Workflow for orchestration. Since we are in Google Cloud, will the same setup work well or I should switch it to Composer. I know Composer is a managed service provided by google and for Oozie I have to do setup work. But in case of Oozie, code changes would be minimal and in case of Composer I have to change the scheduler job, that might result in minimal process changes. I don't even know whether Oozie integration will work as expected or not, since it would be like an external service for cloud. Which scheduler will save my time as well as more suitable for this kind of batch job.

I'm gonna take a stab at this - it depends on how complex your Dataproc job submission is. If its a minimal submission with little to no args, and you don't need to specify the job-ids and plan to use labels - Cloud Scheduler actually works well for your specific purpose of 2 to 3 times a day, and is DEAD SIMPLE.

However, if you need more complex functionality, Cloud Composer is a good future facing option, although, as you predicted - quite a bit of code changes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM