![](/img/trans.png)
[英]Unable to run pyspark in yarn-client mode (pyspark standalone is working though)
[英]Cannot run pyspark jobs in client mode Kubernetes
我正在使用本指南在我的 aks Kubernetes 集群中部署 pyspark:
我已经按照上面的链接中的说明部署了我的驱动程序 pod:
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: spark
name: my-notebook-deployment
labels:
app: my-notebook
spec:
replicas: 1
selector:
matchLabels:
app: my-notebook
template:
metadata:
labels:
app: my-notebook
spec:
serviceAccountName: spark
containers:
- name: my-notebook
image: pidocker-docker-registry.default.svc.cluster.local:5000/my-notebook:latest
ports:
- containerPort: 8888
volumeMounts:
- mountPath: /root/data
name: my-notebook-pv
workingDir: /root
resources:
limits:
memory: 2Gi
volumes:
- name: my-notebook-pv
persistentVolumeClaim:
claimName: my-notebook-pvc
---
apiVersion: v1
kind: Service
metadata:
namespace: spark
name: my-notebook-deployment
spec:
selector:
app: my-notebook
ports:
- protocol: TCP
port: 29413
clusterIP: None
然后我可以使用以下代码创建火花集群:
import os
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
# Create Spark config for our Kubernetes based cluster manager
sparkConf = SparkConf()
sparkConf.setMaster("k8s://https://kubernetes.default.svc.cluster.local:443")
sparkConf.setAppName("spark")
sparkConf.set("spark.kubernetes.container.image", "<MYIMAGE>")
sparkConf.set("spark.kubernetes.namespace", "spark")
sparkConf.set("spark.executor.instances", "7")
sparkConf.set("spark.executor.cores", "2")
sparkConf.set("spark.driver.memory", "512m")
sparkConf.set("spark.executor.memory", "512m")
sparkConf.set("spark.kubernetes.pyspark.pythonVersion", "3")
sparkConf.set("spark.kubernetes.authenticate.driver.serviceAccountName", "spark")
sparkConf.set("spark.kubernetes.authenticate.serviceAccountName", "spark")
sparkConf.set("spark.driver.port", "29413")
sparkConf.set("spark.driver.host", "my-notebook-deployment.spark.svc.cluster.local")
# Initialize our Spark cluster, this will actually
# generate the worker nodes.
spark = SparkSession.builder.config(conf=sparkConf).getOrCreate()
sc = spark.sparkContext
有用。 但我想在另一个 pod 中运行 spark 作业,但如果我这样做,我会出现以下错误:
python-input-1-ca9882ef9eeb> in <module>
25 # Initialize our Spark cluster, this will actually
26 # generate the worker nodes.
---> 27 spark = SparkSession.builder.config(conf=sparkConf).getOrCreate()
28 sc = spark.sparkContext
/usr/local/spark/python/pyspark/sql/session.py in getOrCreate(self)
171 for key, value in self._options.items():
172 sparkConf.set(key, value)
--> 173 sc = SparkContext.getOrCreate(sparkConf)
174 # This SparkContext may be an existing one.
175 for key, value in self._options.items():
/usr/local/spark/python/pyspark/context.py in getOrCreate(cls, conf)
365 with SparkContext._lock:
366 if SparkContext._active_spark_context is None:
--> 367 SparkContext(conf=conf or SparkConf())
368 return SparkContext._active_spark_context
369
/usr/local/spark/python/pyspark/context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
134 try:
135 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
--> 136 conf, jsc, profiler_cls)
137 except:
138 # If an error occurs, clean up in order to allow future SparkContext creation:
/usr/local/spark/python/pyspark/context.py in _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, jsc, profiler_cls)
196
197 # Create the Java SparkContext through Py4J
--> 198 self._jsc = jsc or self._initialize_context(self._conf._jconf)
199 # Reset the SparkConf to the one actually used by the SparkContext in JVM.
200 self._conf = SparkConf(_jconf=self._jsc.sc().conf())
/usr/local/spark/python/pyspark/context.py in _initialize_context(self, jconf)
304 Initialize SparkContext in function to allow subclass specific initialization
305 """
--> 306 return self._jvm.JavaSparkContext(jconf)
307
308 @classmethod
/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __call__(self, *args)
1523 answer = self._gateway_client.send_command(command)
1524 return_value = get_return_value(
-> 1525 answer, self._gateway_client, None, self._fqn)
1526
1527 for temp_arg in temp_args:
/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (starting from 29414)! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:220)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:128)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1283)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:989)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:364)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
据我所知,我正在尝试在客户端节点中运行我的 Spark 集群,jupyter pod 充当主节点并创建从属节点,当我在 jupyter pod 中运行代码时它可以工作,但是当其他 pod 尝试连接时它可以。
我怎么能解决这个问题?
我有一个类似的问题,最后我手动创建了客户端 pod 所需的服务。 就我而言,我想部署不支持集群模式的 spark-thrift 服务器。
首先你需要创建 spark blockManager 和驱动程序本身所需的服务
apiVersion: v1
kind: Service
metadata:
name: spark-thrift
spec:
type: ClusterIP
ports:
- protocol: TCP
port: 4000
name: driver
- protocol: TCP
port: 4001
name: block-manager
现在你可以像这样启动你的驱动程序:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: spark-thrift
labels:
app: spark-thrift
spec:
template:
spec:
containers:
- name: spark-thrift-driver
image: image:latest
command:
- /opt/spark/bin/spark-submit
args:
- "--name"
- "spark-thrift"
- "--class"
- "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2"
- "--conf"
- "spark.driver.port=4000"
- "--conf"
- "spark.driver.host=spark-thrift"
- "--conf"
- "spark.driver.bindAddress=0.0.0.0"
- "--conf"
- "spark.driver.blockManager.port=4001"
imagePullPolicy: Always
ports:
- name: driver
containerPort: 4000
- name: blockmanager
containerPort: 4001
这里的重要论点是
显然,这不是一个完整的工作 pod 示例,您仍然需要在规范中配置自己的端口和卷。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.