簡體   English   中英

如何在 SparkConf 中初始化 master 以便在 k8s 集群上分布式運行?

[英]How to initialize a master in SparkConf in order to run distributed on a k8s cluster?

我已經部署了具有 3 個節點的 k8s 集群,部署了 hdfs。我編寫了一個簡單的 pyspark 腳本並希望將其部署在 k8s 集群上,但不知道如何正確初始化 spark 上下文:需要將什么作為 master 傳遞給SparkConf().setMaster ??(當我將 master 設置為k8s://https://172.20.234.174:6443我收到錯誤)

劇本

我用來在 k8s 上部署的命令:

bin/spark-submit \
     --name spark_k8s_hello_world_0 \
     --master k8s://https://172.20.234.174:6443 \
     --deploy-mode cluster \
     --conf spark.executor.instances=2 \
     --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
     --conf spark.kubernetes.container.image=semenchukou/pyspark-k8s-example:0.1 \
     --conf spark.kubernetes.pyspark.pythonVersion=3 \
      local:///app/HelloWorldSpark.py

UPD當前腳本:

#!/usr/bin/env python

from pyspark import SparkContext
from pyspark import SparkConf

if __name__ == '__main__':
    conf = SparkConf()
    sc = SparkContext(conf = conf)
    txt = sc.textFile('hdfs://172.20.234.174:1515/testing/testFile.txt')
    first = txt.first()
    sc.parallelize(first).saveAsTextFile('hdfs://172.20.234.174:9000/testing/result.txt')

我正在從集群中的主機運行以下命令:

bin/spark-submit \
     --name spark_k8s_hello_world_0 \
     --master k8s://https://172.20.234.174:6443 \
     --deploy-mode cluster \
     --conf spark.executor.instances=2 \
     --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
     --conf spark.kubernetes.container.image=semenchukou/pyspark-k8s-example:conf \
     --conf spark.kubernetes.pyspark.pythonVersion=3 \
      local:///app/HelloWorldSpark.py

並獲得以下錯誤堆棧跟蹤:

File "/app/HelloWorldSpark.py", line 8, in <module>
    sc = SparkContext(conf = conf)
  File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 136, in __init__
  File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 198, in _do_init
  File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 306, in _initialize_context
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1525, in __call__
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: External scheduler cannot be instantiated
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:493)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [sparkk8shelloworld0-1580920409707-driver]  in namespace: [default]  failed.
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:185)
    at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
    at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
    at scala.Option.map(Option.scala:146)
    at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:55)
    at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
    ... 13 more
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
    at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
    at java.net.InetAddress.getAllByName(InetAddress.java:1193)
    at java.net.InetAddress.getAllByName(InetAddress.java:1127)
    at okhttp3.Dns$1.lookup(Dns.java:39)
    at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
    at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
    at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
    at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
    at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
    at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
    at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:107)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
    at okhttp3.RealCall.execute(RealCall.java:69)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:379)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:344)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:313)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:296)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:801)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:218)
    ... 20 more
20/02/05 16:33:55 INFO ShutdownHookManager: Shutdown hook called
20/02/05 16:33:55 INFO ShutdownHookManager: Deleting directory /tmp/spark-1399e509-6729-436d-9355-eecec6e58113
20/02/05 16:33:55 INFO ShutdownHookManager: Deleting directory /var/data/spark-3355ab9d-38f1-4083-b7af-5fd03dc1ae2f/spark-2a479391-3a9d-4f5b-93a0-707132c802cc

您不應該在代碼中使用SparkConf().setMaster()設置任何內容, master配置值會從spark-submit自動傳播。

在執行spark-submit --master ...你可以通過:

  • Kubernetes API 端點從kubectl cluster-info並附加前導k8s:// ,例如: k8s://https://xxx.xxx.xxx.xxx:443
  • Kubernetes API 內部端點k8s://kubernetes.default.svc.cluster.local:443如果您在 Kubernetes 集群網絡中運行spark-submit

另外請參考官方文檔並嘗試先運行 Spark PI 示例。

希望能幫助到你。

更新1:

Spark Driver 使用另一個屬性spark.kubernetes.driver.master來設置 Kubernetes 客戶端 URL,在集群模式下運行時,默認為https://kubernetes.default.svc (這是默認的內部 Kubernetes API 端點,默認為 Kubernetes 集群在default命名空間中有名為kubernetes Service ): ref1ref2

在您的情況下,您可以嘗試設置額外的--conf spark.kubernetes.driver.master=https://172.20.234.174:6443

此外,我建議您檢查集群的default命名空間中kubernetes存在名為kubernetesService ,並且它還應該公開端口443 如果是這樣 - 您的集群中的 DNS 解析可能存在問題,我猜這是另一個主題。

更新 2:

上述配置選項尚不可用,並在PR [SPARK-30371] 中引入。 在此之前,Spark Driver 總是使用https://kubernetes.default.svc:443來調用 Kubernetes API。 如果您無法在集群內解析此地址 - 可能您在 DNS 或集群設置方面遇到了一些問題。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM