[英]Two separate images to run spark in client-mode using Kubernetes, Python with Apache-Spark 3.2.0?
[英]Spark kubernetes client mode (separate driver pod) setup
我正在嘗試讓 spark kubernetes 安裝工作,其中 spark 驅動程序節點位於其自己的單獨 pod(客戶端模式)中,並使用 SparkSession.builder 機制引導集群(不使用 spark-submit)。
我的工作是這樣的:
https://spark.apache.org/docs/latest/running-on-kubernetes.html
以下是驅動程序用於引導集群的代碼:
val sparkSession = SparkSession.builder
.master("k8s://https://kubernetes.default.svc:32768")
.appName("test")
.config("spark.driver.host", "sparkrunner-0")
.config("spark.driver.port", "7077")
.config("spark.driver.blockManager.port", "7078")
.config("spark.kubernetes.container.image","spark-alluxio")
.config("fs.alluxio.impl", "alluxio.hadoop.FileSystem")
.config("fs.alluxio-ft.impl", "alluxio.hadoop.FaultTolerantFileSystem")
.getOrCreate
容器鏡像 (spark-alluxio) 是通過將 alluxio 客戶端庫添加到二進制 spark 發行版 (2.4.2) 來構建的。
這是用於部署驅動程序的 kubernetes yaml,它位於 StatefulSet 中:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: sparkrunner
labels:
app: sparkrunner
spec:
selector:
matchLabels:
app: sparkrunner
serviceName: sparkrunner
replicas: 1
template:
metadata:
labels:
app: sparkrunner
spec:
containers:
- name: sparkrunner
image: "rb/sparkrunner:latest"
imagePullPolicy: Never
ports:
- name: application
containerPort: 9100
- name: driver-rpc-port
containerPort: 7077
- name: blockmanager
containerPort: 7078
這是 kubernetes yaml 部署位於驅動程序之上的服務:
# Headless service for stable DNS entries of StatefulSet members.
apiVersion: v1
kind: Service
metadata:
name: sparkrunner
spec:
ports:
- name: driver-rpc-port
protocol: TCP
port: 7077
targetPort: 7077
- name: blockmanager
protocol: TCP
port: 7078
targetPort: 7078
clusterIP: None
selector:
app: sparkrunner
---
# Client service for connecting to any spark instance.
apiVersion: v1
kind: Service
metadata:
name: sparkdriver
spec:
type: NodePort
ports:
- name: sparkdriver
port: 9100
selector:
app: sparkrunner
當我將它部署到集群時,驅動程序將啟動,但是當它嘗試查找執行程序時,事情將失敗並出現套接字異常,可能是因為工作人員無法連接回驅動程序,反之亦然?
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/04/26 20:24:39 INFO SparkContext: Running Spark version 2.4.2
20/04/26 20:24:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/04/26 20:24:40 INFO SparkContext: Submitted application: test
20/04/26 20:24:40 INFO SecurityManager: Changing view acls to: root
20/04/26 20:24:40 INFO SecurityManager: Changing modify acls to: root
20/04/26 20:24:40 INFO SecurityManager: Changing view acls groups to:
20/04/26 20:24:40 INFO SecurityManager: Changing modify acls groups to:
20/04/26 20:24:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
20/04/26 20:24:41 INFO Utils: Successfully started service 'sparkDriver' on port 7077.
20/04/26 20:24:41 INFO SparkEnv: Registering MapOutputTracker
20/04/26 20:24:41 INFO SparkEnv: Registering BlockManagerMaster
20/04/26 20:24:41 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/04/26 20:24:41 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/04/26 20:24:41 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-e8aa33ba-26d2-421d-9957-9cba1c9a3b9f
20/04/26 20:24:41 INFO MemoryStore: MemoryStore started with capacity 1150.2 MB
20/04/26 20:24:41 INFO SparkEnv: Registering OutputCommitCoordinator
20/04/26 20:24:41 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/04/26 20:24:41 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://sparkrunner-0:4040
20/04/26 20:24:53 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7078.
20/04/26 20:24:53 INFO NettyBlockTransferService: Server created on sparkrunner-0:7078
20/04/26 20:24:53 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/04/26 20:24:53 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, sparkrunner-0, 7078, None)
20/04/26 20:24:53 INFO BlockManagerMasterEndpoint: Registering block manager sparkrunner-0:7078 with 1150.2 MB RAM, BlockManagerId(driver, sparkrunner-0, 7078, None)
20/04/26 20:24:53 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, sparkrunner-0, 7078, None)
20/04/26 20:24:53 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, sparkrunner-0, 7078, None)
20/04/26 20:24:53 WARN WatchConnectionManager: Exec Failure
java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:246)
at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:166)
at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:257)
at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:107)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:200)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
從這里我無法真正判斷出什么失敗了——是服務定義的問題還是驅動程序本身的問題? 我試過擺弄選擇器和主機名,但似乎沒有任何效果。
經過更多的戳戳和催促,我發現我用於 k8s 服務的地址不正確:
k8s:// https://kubernetes.default.svc:32768
我從 kubectl cluster-info 中得到了這個,但我的 minikube 實例可能報告錯誤(或者可能代理外部)。 當我用這個替換時:
k8s:// https://10.96.0.1:443
這是 api 的內部地址,事情開始工作了。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.