简体   繁体   中英

How to schedule jobs in a spark cluster using Kubernetes

I am rather new to both Spark and Kubernetes but i am trying to understand how this can work in a production envitonment. I am planning to use Kubernetes to deploy a Spark cluster. I will then use SparkStraeming to process data from Kafka and output the result to a database. Furthermore, i am planning to set up a scheduled Spark-batch-job that is run every night.

1. How do i schedule the nightly batch-runs? I understand that Kubernetes has a cron-like feature (see documentation ). But from my understanding, this is to schedule container deployments, i will already have my containers up and running (since i use the Spark-cluster for SparkStreaming), i just want to submit a job to the cluster every night.

2. Where do i store the SparkStreaming-application(s) (there might be many) and how do i start it? Do i seperate the Spark-container from the SparkStreaming-application (ie should the container only contain a clean Spark-node, and keep the SparkStreaming-application in persistent storage and then push the job to the container using kubectl)? Or should my docker-file clone my SparkStreaming-application from a repository and be responsible for starting it.

I have tried looking through the documentations but i am unsure on how to set it up. Any link or reference that answers my question is highly appreciated.

You should absolutely use the CronJob resource for performing the backups... see also these repos for helping bootstrap spark on k8s

https://github.com/ramhiser/spark-kubernetes

https://github.com/navicore/spark-on-kubernetes

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM