简体   繁体   English

kubernetes 上的 Spark:执行程序 pod 在创建 sparkContext 时无法启动

[英]Spark on kubernetes: Executor pods not able to start and while creating sparkContext

I am trying to run Spark on kubernetes along with interactive commands run through Spark shell or jupyter interface.我正在尝试在 kubernetes 上运行 Spark,以及通过 Spark shell 或 jupyter 接口运行的交互式命令。 I had build custom images for both driver pod and executor pods and use below code to spin up Spark Context我已经为驱动程序 pod 和执行程序 pod 构建了自定义映像,并使用下面的代码来启动 Spark Context

import pyspark
conf = pyspark.SparkConf()
conf.setMaster("k8s://https://kubernetes.default.svc.cluster.local:443")
conf.set(
    "spark.kubernetes.container.image", 
    "<Repo>/<IMAGENAME>:latest") 

conf.set("spark.kubernetes.namespace": "default")

# Authentication certificate and token (required to create worker pods):
conf.set(
    "spark.kubernetes.authenticate.caCertFile", 
    "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt")
conf.set(
    "spark.kubernetes.authenticate.oauthTokenFile", 
    "/var/run/secrets/kubernetes.io/serviceaccount/token")

conf.set(
    "spark.kubernetes.authenticate.driver.serviceAccountName", 
    "spark-master") 
conf.set("spark.executor.instances", "2") 
conf.set(
    "spark.driver.host", "spark-test-jupyter") 
conf.set("spark.executor.memory", "1g")
conf.set("spark.executor.cores", "1")
conf.set("spark.driver.blockManager.port", "7777")
conf.set("spark.driver.bindAddress", "0.0.0.0")

conf.set("spark.driver.port", "29416") 

sc = pyspark.SparkContext(conf=conf)

Driver tries to run executor pods, but it ends up in 2 executor pods trying to start but eventually erroring out and new set of pods doing the same. Driver 尝试运行 executor pod,但最终导致 2 个 executor pod 尝试启动但最终出错,并且新的一组 pod 也在执行相同的操作。 Logs below:下面的日志:

pyspark-shell-1620894878554-exec-8   0/1     Pending             0          0s
pyspark-shell-1620894878554-exec-8   0/1     ContainerCreating   0          0s
pyspark-shell-1620894878528-exec-7   1/1     Running             0          1s
pyspark-shell-1620894878554-exec-8   1/1     Running             0          2s
pyspark-shell-1620894878528-exec-7   0/1     Error               0          4s
pyspark-shell-1620894878554-exec-8   0/1     Error               0          4s
pyspark-shell-1620894878528-exec-7   0/1     Terminating         0          5s
pyspark-shell-1620894878528-exec-7   0/1     Terminating         0          5s
pyspark-shell-1620894878554-exec-8   0/1     Terminating         0          5s
pyspark-shell-1620894878554-exec-8   0/1     Terminating         0          5s
pyspark-shell-1620894883595-exec-9   0/1     Pending             0          0s
pyspark-shell-1620894883595-exec-9   0/1     Pending             0          0s
pyspark-shell-1620894883595-exec-9   0/1     ContainerCreating   0          0s
pyspark-shell-1620894883623-exec-10   0/1     Pending             0          0s
pyspark-shell-1620894883623-exec-10   0/1     Pending             0          0s
pyspark-shell-1620894883623-exec-10   0/1     ContainerCreating   0          0s
pyspark-shell-1620894883595-exec-9    1/1     Running             0          1s
pyspark-shell-1620894883623-exec-10   1/1     Running             0          3s

This goes on endlessly until stopped.这种情况无休止地进行,直到停止。

What could be going wrong here?这里可能出了什么问题?

Your spark.driver.host should be DNS of the service, so something like spark-test-jupyter.default.svc.cluster.local您的spark.driver.host应该是服务的 DNS,所以类似于spark-test-jupyter.default.svc.cluster.local

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM