I'm new to Apache Spark.
I'm trying to run a spark session using pyspark . I have configured to have 2 executor nodes for it. Now both of the executor nodes needs to pull my custom built spark image which is in a repo.
Below is the configuration in python for my spark session/job
spark = SparkSession.builder.appName('sparkpi-test1'
).master("k8s://https://kubernetes.default:443"
).config("spark.kubernetes.container.image", "\<repo\>"
).config("spark.kubernetes.authenticate.caCertFile", "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
).config("spark.kubernetes.authenticate.oauthTokenFile", "/var/run/secrets/kubernetes.io/serviceaccount/token"
).config("spark.kubernetes.authenticate.driver.serviceAccountName", "spark-driver-0"
).config("spark.executor.instances", 2
).config("spark.driver.host", "test"
).config("spark.driver.port", "20020"
).config("spark.serializer", "org.apache.spark.serializer.KryoSerializer"
).config("spark.sql.hive.convertMetastoreParquet", "false"
).config("spark.jars.packages", "org.apache.hudi:hudi-spark3.3-bundle_2.12:0.12.1,org.apache.spark:spark-avro_2.12:3.1.2"
).config("spark.kubernetes.node.selector.testNodeCategory", "ondemand"
).getOrCreate()
sparkpi-test1-2341a185c8144b60-exec-1 0/1
ImagePullBackOff 0 5h17m sparkpi-test1-2341a185c8144b60-exec-2 0/1
ImagePullBackOff 0 5h17m
So, Correct me if I'm doing anything wrong. I'm trying to setup Spark in my existing kube.netes cluster using my custom built spark image in some repo. I mentioned the same in configuration in my python file.
).config("spark.kube.netes.container.image", "<repo>"
According to docs
Container image to use for the Spark application. This is usually of the form example.com/repo/spark:v1.0.0. This configuration is required and must be provided by the user, unless explicit images are provided for each different container type.
Why is my executor node failing to pull the image from registry? How do I pull it manually for executor node for the time being?
Just for reference Find the below error messages
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
I guess the above error message is because my executor pods didn't create succesfully.
I've got it. I was using terraform to build all the resources. .tfstate file got changed and is causing the pods to have these errors
Clearing terraform cache got my problem solved.
To clean terraform cache run
rm -rf .terraform
In your terraform directory
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.