简体繁体中英

How to schedule jobs in a spark cluster using Kubernetes

原文 2017-01-26 07:47:28 3 1 apache-spark/ docker/ kubernetes/ spark-streaming

I am rather new to both Spark and Kubernetes but i am trying to understand how this can work in a production envitonment. I am planning to use Kubernetes to deploy a Spark cluster. I will then use SparkStraeming to process data from Kafka and output the result to a database. Furthermore, i am planning to set up a scheduled Spark-batch-job that is run every night.

1. How do i schedule the nightly batch-runs? I understand that Kubernetes has a cron-like feature (see documentation ). But from my understanding, this is to schedule container deployments, i will already have my containers up and running (since i use the Spark-cluster for SparkStreaming), i just want to submit a job to the cluster every night.

2. Where do i store the SparkStreaming-application(s) (there might be many) and how do i start it? Do i seperate the Spark-container from the SparkStreaming-application (ie should the container only contain a clean Spark-node, and keep the SparkStreaming-application in persistent storage and then push the job to the container using kubectl)? Or should my docker-file clone my SparkStreaming-application from a repository and be responsible for starting it.

I have tried looking through the documentations but i am unsure on how to set it up. Any link or reference that answers my question is highly appreciated.

1 answers

You should absolutely use the CronJob resource for performing the backups... see also these repos for helping bootstrap spark on k8s

https://github.com/ramhiser/spark-kubernetes

https://github.com/navicore/spark-on-kubernetes

In simple terms, how does Spark schedule jobs?

How to schedule/trigger spark jobs in Cloudera?

How are spark jobs submitted in cluster mode?

Using spark_sklearn with a kubernetes cluster

Submitting Spark Jobs to Spark Cluster

Need solution to schedule Spark jobs

how to auto scale spark job in kubernetes cluster

How to run multiple spark jobs parallel on yarn with cluster mode?

how to properly submit spark jobs on a stand-alone cluster

How to submit multiple spark jobs to single AWS EMR cluster

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question In simple terms, how does Spark schedule jobs? How to schedule/trigger spark jobs in Cloudera? How are spark jobs submitted in cluster mode? Using spark_sklearn with a kubernetes cluster Submitting Spark Jobs to Spark Cluster Need solution to schedule Spark jobs how to auto scale spark job in kubernetes cluster How to run multiple spark jobs parallel on yarn with cluster mode? how to properly submit spark jobs on a stand-alone cluster How to submit multiple spark jobs to single AWS EMR cluster

Related Tags

How to schedule jobs in a spark cluster using Kubernetes

Question

1 answers

solution1 -1 2017-03-29 00:55:33

solution1
-1 2017-03-29 00:55:33