简体   繁体   English

Spark 作业无法在 Kubernetes 集群上启动

[英]Spark job fails to launch on Kubernetes cluser

I am trying run a spark job on kubernetes cluster with我正在尝试在 kubernetes 集群上运行一个 Spark 作业

./bin/spark-submit     --master k8s://https://<master-ip-addr>:6443    --deploy-mode cluster     --name spark-pi     --class org.apache.spark.examples.SparkPi     --conf spark.executor.instances=5     --conf spark.kubernetes.container.image=<hub-user>/spark     local:///path/to/examples.jar

When I run above command from my local machine I get following error当我从本地机器运行上述命令时,出现以下错误

javax.net.ssl.SSLHandshakeException: 
sun.security.validator.ValidatorException: PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at 
sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at 
sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1946) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:316) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:310) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1639) at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:223) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1037) at sun.security.ssl.Handshaker.process_record(Handshaker.java:965) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1064) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1367) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1395) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1379) at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:281) at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:251) at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:151) at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195) at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121) at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:66) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:109) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:450) at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:317) at sun.security.validator.Validator.validate(Validator.java:262) at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:330) at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:237) at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:132) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1621) ... 39 more Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:445) ... 45 more Exception in thread "kubernetes-dispatcher-0" Exception in thread "main" java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@22c3ccc0 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@1ef5330b[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326) at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533) at java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:632) at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:678) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.scheduleReconnect(WatchConnectionManager.java:300) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$800(WatchConnectionManager.java:48) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:213) at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543) at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:208) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:148) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:204) at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543) at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:208) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:148) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1946) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:316) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:310) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1639) at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:223) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1037) at sun.security.ssl.Handshaker.process_record(Handshaker.java:965) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1064) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1367) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1395) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1379) at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:281) at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:251) at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:151) at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195) at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121) at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:66) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:109) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) ... 4 more Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:450) at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:317) at sun.security.validator.Validator.validate(Validator.java:262) at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:330) at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:237) at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:132) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1621) ... 39 more Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:445) ... 45 more 2020-02-09 16:57:32 INFO ShutdownHookManager:58 - Shutdown hook called

I dont understand this error.我不明白这个错误。 Whats going wrong ?怎么了? How do I resolve it ?我该如何解决?

According to stacktrace, it is caused when the Java environment does not have information about the HTTPS server to verify that it is a valid website.根据stacktrace,这是由于Java环境没有HTTPS服务器的信息来验证它是一个有效的网站时导致的。

sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

keytool to fix SSL issue解决 SSL 问题的keytool

You can use keytool :您可以使用keytool

See this answer or that answer看到这个答案那个答案

  • Check if your certificate is already in the truststore by running the following command: keytool -list -keystore "%JAVA_HOME%/jre/lib/security/cacerts"通过运行以下命令检查您的证书是否已在信任库中: keytool -list -keystore "%JAVA_HOME%/jre/lib/security/cacerts"

  • Add the certificate for <master-ip-addr> to the truststore file of the used JVM located at %JAVA_HOME%\\lib\\security\\cacerts :<master-ip-addr>的证书添加到位于%JAVA_HOME%\\lib\\security\\cacerts的所用 JVM 的信任库文件中:

    • If your certificate is missing, you can get it by downloading it with your browser and add it to the truststore with the following command: keytool -import -noprompt -trustcacerts -alias <AliasName> -file <certificate> -keystore <KeystoreFile> -storepass <Password>如果您的证书丢失,您可以通过使用浏览器下载并使用以下命令将其添加到信任库来获取它: keytool -import -noprompt -trustcacerts -alias <AliasName> -file <certificate> -keystore <KeystoreFile> -storepass <Password>

    • After import you can run the first command again to check if your certificate was added.导入后,您可以再次运行第一个命令以检查您的证书是否已添加。

  • Restart your JVM重启你的 JVM

InstallCert.java安装证书

  • Get InstallCert.java获取InstallCert.java
  • Run [InstallCert.java, with your hostname and https port, and press 1 when ask for input.使用您的主机名和 https 端口运行 [InstallCert.java,并在要求输入时按 1。 It will add your localhost as a trusted keystore, and generates a file jssecacerts它会将您的本地主机添加为受信任的密钥库,并生成一个文件jssecacerts
java InstallCert <master-ip-addr>:6443

You have either to configure the connection to your Kubernetes Cluster like you would do for kubectl by placing the Kubeconfig file in a file ~/.kube/cofig or set the spark submit parameters for authentication correctly您必须像对kubectl一样配置与 Kubernetes 集群的连接, kubectl是将 Kubeconfig 文件放在文件 ~/.kube/cofig 中,或者正确设置用于身份验证的 spark 提交参数

spark-submit \
... \
--conf spark.kubernetes.authenticate.caCertFile=path_to_cert_file \
--conf spark.kubernetes.authenticate.oauthTokenFile=path_to_token \
file_to_execute.jar

You have to use different parameters depending if you start in cluster or client mode All possible configurations for authentication are listed in the documentation https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration您必须使用不同的参数,具体取决于您是在集群模式还是客户端模式下启动所有可能的身份验证配置都在文档https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration中列出

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 kubernetes 上的 Spark 作业失败,没有特定错误 - Spark job on kubernetes fails without specific error 启动并提交工作火花 - launch and submit job spark 在 Kubernetes 上使用 Kafka 进行 Spark 作业 - Spark Job with Kafka on Kubernetes Spark 执行器在 Kube.netes (EKS) 上失败——缺少 podName - Spark executors fails on Kubernetes (EKS) --podName is missing 在启用 kerberos 的纱线集群上启动 spark/spring-boot 作业 - launch spark/spring-boot job on yarn cluster with kerberos enable 输入字符串ea的java 9 NumberFormatException上的Spark作业失败 - Spark job fails on java 9 NumberFormatException for input string ea 由于容器启动时的AM容器异常,无头环境中的MapReduce作业失败了N次 - MapReduce job in headless environment fails N times due to AM Container exception from container-launch 本地 microK8 的 Kubernetes 集群上的 spark-submit 失败:java.security.cert.CertPathValidatorException - spark-submit on local microK8's Kubernetes cluster fails with: java.security.cert.CertPathValidatorException 无法使用Java启动Spark - Cannot launch spark with Java 由于某些未知原因,由于执行器丢失,Spark Job在saveAsHadoopDataset阶段失败 - Spark Job fails at saveAsHadoopDataset stage due to Lost Executor due to some unknown reason
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM