简体繁体中英

Two separate images to run spark in client-mode using Kubernetes, Python with Apache-Spark 3.2.0?

原文 2021-11-15 20:41:56 7 1 python/ docker/ apache-spark/ kubernetes

I deployed Apache Spark 3.2.0 using this script run from a distribution folder for Python:

./bin/docker-image-tool.sh -r <repo> -t my-tag -p ./kubernetes/dockerfiles/spark/bindings/python/Dockerfile build

I can create a container under K8s using Spark-Submit just fine. My goal is to run spark-submit configured for client mode vs. local mode and expect additional containers will be created for the executors.

Does the image I created allow for this, or do I need to create a second image (without the -p option) using the docker-image tool and configure within a different container ?

1 answers

It turns out that only one image is needed if you're running PySpark. Using Client-mode, the code spawns the executors and workers for you and they run once you create a spark-submit command. Big improvement from Spark version 2.4!

How can I make my python code run on the AWS slave nodes using Apache-Spark?

How to run code on the AWS cluster using Apache-Spark?

Using Apache-Spark to analyze time series

Why is Apache-Spark - Python so slow locally as compared to pandas?

Apache-Spark error on python : java.lang.reflect.InaccessibleObjectException

input path does not exist apache-spark

Apache-Spark parallelly handles the separated csv files

Submit jobs to Apache-Spark while being behind a firewall

Run simple word count program using Apache Spark with python

How to run Multi threaded jobs in apache spark using scala or python?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How can I make my python code run on the AWS slave nodes using Apache-Spark? How to run code on the AWS cluster using Apache-Spark? Using Apache-Spark to analyze time series Why is Apache-Spark - Python so slow locally as compared to pandas? Apache-Spark error on python : java.lang.reflect.InaccessibleObjectException input path does not exist apache-spark Apache-Spark parallelly handles the separated csv files Submit jobs to Apache-Spark while being behind a firewall Run simple word count program using Apache Spark with python How to run Multi threaded jobs in apache spark using scala or python?

Related Tags

Two separate images to run spark in client-mode using Kubernetes, Python with Apache-Spark 3.2.0?

Question

1 answers

solution1 0 2021-11-19 17:55:03

solution1
0 2021-11-19 17:55:03