简体   繁体   English

尽管设置了绑定地址,ApplicationMaster 仍无法找到 Spark 驱动程序(集群模式纱线)

[英]ApplicationMaster not able to find Spark Driver despite binding address set (Cluster Mode Yarn)

I have 3 nodes cluster that through the UIs shows that everything is well connected.我有 3 个节点集群,通过 UI 显示一切都连接良好。 Now if i do submit a Spark application with deployment mode being cluster then i get : java.net.BindException: Cannot assign requested address: bind: Service 'sparkDriver' failed .现在,如果我确实提交了部署模式为集群的 Spark 应用程序,那么我会得到: java.net.BindException: Cannot assign requested address: bind: Service 'sparkDriver' failed Full error in log (log of one of the Slaves) (when the application is launched on the current node then it runs well.日志中的完整错误(从站之一的日志)(当应用程序在当前节点上启动时,它运行良好。
The Spark session is defined like the following : Spark 会话定义如下:

 SparkSession spark = SparkSession.builder().enableHiveSupport().appName("sparkApp")
.master("yarn").config("spark.driver.host","VM2").getOrCreate();

2021-11-09 17:59:52,149 ERROR yarn.ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:504) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:899) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:898) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:898) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: java.net.BindException: Cannot assign requested addre 2021-11-09 17:59:52,149 错误 yarn.ApplicationMaster:未捕获的异常:org.apache.spark.SparkException:awaitResult 中抛出的异常:在 org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301 ) 在 org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:504) 在 org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268) 在 org.apache.spark。 deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:899) 在 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:898) 在 java.security.AccessController。 doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.spark.deploy。 yarn.ApplicationMaster$.main(ApplicationMaster.scala:898) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 引起:java.net.BindException:无法分配请求的地址ss: bind: Service 'sparkDriver' failed after 16 retries (on a random free port)! ss: bind: 服务 'sparkDriver' 重试 16 次后失败(在随机空闲端口上)! Consider explicitly setting the appropriate binding address for the service 'sparkDriver' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address.考虑将服务“sparkDriver”的适当绑定地址(例如 SparkDriver 的 spark.driver.bindAddress)显式设置为正确的绑定地址。 at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:438) at sun.nio.ch.Net.bind(Net.java:430) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:225) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:134) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:550) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:248) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:356) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.在 sun.nio.ch.Net.bind0(Native Method) 在 sun.nio.ch.Net.bind(Net.java:438) 在 sun.nio.ch.Net.bind(Net.java:430) 在 sun .nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:225) 在 io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:134) 在 io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel) .java:550) 在 io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334) 在 io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506) 在 io.netty.channel.AbstractChannelHandlerContext。 bind(AbstractChannelHandlerContext.java:491) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:248) at io.netty.bootstrap.AbstractBootstrap $2.run(AbstractBootstrap.java:356) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent。 SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) SingleThreadEventExecutor.runAllTask​​s(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:98) io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 在 io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) 在 java.lang.Thread.run(Thread.java) :748)

Also through this Yarn interface也通过这个 Yarn 接口用户界面 , if the first attempt of the application execution is done on VM2 (the current node) then it runs, otherwise it does not (except if the second attempt is on VM2) , 如果应用程序执行的第一次尝试是在 VM2(当前节点)上完成,则它会运行,否则不会(除非第二次尝试是在 VM2 上)

I think you should alter your code:我认为你应该改变你的代码:

SparkSession spark = SparkSession.builder().enableHiveSupport().appName("sparkApp").master("yarn").getOrCreate();

You are using YARN (NOT standalone) you do not need to specify the driver.您使用的是 YARN(非独立),您无需指定驱动程序。 Yarn does the assignment for you. Yarn 为您完成任务。

The Documentation does say:文档确实说:

spark.driver.host: Hostname or IP address for the driver. spark.driver.host:驱动程序的主机名或 IP 地址。 This is used for communicating with the executors and the standalone Master .这用于与 executors 和独立的 Master通信。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Spark YARN群集模式出现此错误“找不到或加载主类org.apache.spark.deploy.yarn.ApplicationMaster” - Spark YARN Cluster mode get this error “Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster” Spark无法在纱线群集模式下运行 - Spark not able to run in yarn cluster mode 如何在YARN集群中的特定节点上启动Spark的ApplicationMaster? - How to launch Spark's ApplicationMaster on a particular node in YARN cluster? 如何找出在纱簇模式下以spark模式运行的任务的驱动程序进程节点 - how to find out driver process node for tasks running in spark in yarn-cluster mode Spark-yarn 中客户端模式下的 ApplicationMaster 如何工作? - How ApplicationMaster in Client mode in Spark-yarn works? 在 Yarn 集群模式下如何处理 Spark App 的异常驱动程序终止 - How is abnormal Driver termination handled for a Spark App in Yarn cluster mode 使Spark使用/ etc / hosts文件在YARN集群模式下进行绑定 - Making spark use /etc/hosts file for binding in YARN cluster mode 有关故障转移过程如何在纱线群集模式下为Spark驱动程序(及其YARN容器)工作的资源/文档 - Resources/Documentation on how does the failover process work for the Spark Driver (and its YARN Container) in yarn-cluster mode 为Spark YARN集群模式设置类路径的好方法? - The good way to set classpath for Spark YARN cluster mode? 为什么集群模式下的YARN上的Spark会因“线程中的异常”驱动程序“java.lang.NullPointerException”而失败? - Why does Spark on YARN in cluster mode fail with “Exception in thread ”Driver“ java.lang.NullPointerException”?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM