简体   繁体   中英

Running PySpark job on Kubernetes spark cluster

I am trying to run a Spark job on a separate master Spark server hosted on kubernetes but port forwarding reports the following error:

E0206 19:52:24.846137   14968 portforward.go:400] an error occurred forwarding 7077 -> 7077: error forwarding port 7077 to pod 1cf922cbe9fc820ea861077c030a323f6dffd4b33bb0c354431b4df64e0db413, uid : exit status 1: 2022/02/07 00:52:26 socat[25402] E connect(16, AF=2 127.0.0.1:7077, 16): Connection refused

After tinkering with it a bit more, I noticed this output when launching the helm chart for Apache Spark ** IMPORTANT: When submit an application from outside the cluster service type should be set to the NodePort or LoadBalancer. ** ** IMPORTANT: When submit an application from outside the cluster service type should be set to the NodePort or LoadBalancer. ** .

This led me to research a bit more into Kubernetes networking. To submit a job, it is not sufficient to forward port 7077. Instead, the cluster itself needs to have an IP assigned. This requires the helm chart to be launched with the following commands to set Spark config values helm install my-release --set service.type=LoadBalancer --set service.loadBalancerIP=192.168.2.50 bitnami/spark . My host IP address is above and will be reachable by the Docker container.

With the LoadBalancer IP assigned, Spark will run using the example code provided.

Recap: Don't use port forwarding to submit jobs, a Cluster IP needs to be assigned.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM