简体   繁体   中英

How to programatically submit a spark application in yarn-client mode?

I have a simple spark job which replaces spaces with commas in a given input file.

When this job is submitted locally (Using IDE and executing the built jar) it completes successfully and when the master is set to "yarn-client" the job hangs for very long time and throws the following exception.

We have a usecase where we want to submit the job programatically rather than building a jar and submitting it through spark-submit.

Spark version : 1.6.1 Hadoop version : 2.7.1

and i got all the spark, yarn and hadoop dependencies in my pom.

Job failed due to following exception

java.net.ConnectException: Call From spark.node123.com/192.168.2.1 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.GeneratedConstructorAccessor13.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
    at org.apache.hadoop.ipc.Client.call(Client.java:1480)
    at org.apache.hadoop.ipc.Client.call(Client.java:1407)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy10.getClusterMetrics(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:152)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy11.getClusterMetrics(Unknown Source)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:246)
    at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:129)
    at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:129)
    at org.apache.spark.Logging$class.logInfo(Logging.scala:58)
    at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:62)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:128)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
    at tardis.platform.TardisContext$.apply(TardisContext.scala:20)
    at tardis.common.plugins.Heartbeat.isAbleTocreateContext(Heartbeat.scala:45)
    at tardis.common.plugins.Heartbeat.performAction(Heartbeat.scala:33)
    at tardis.core.scheduler.jobs.PluginExecutorJob.execute(PluginExecutorJob.scala:40)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
    at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
    at org.apache.hadoop.ipc.Client.call(Client.java:1446)
    ... 25 more 

我必须添加hadoop和yarn配置,才能在yarn-client模式下成功提交应用程序。

You can not remotely submit your spark job in client mode since your computer have to run the driver program itself which require a lot of connection. If you insist using this method, you have to config your firewall to allow some port to connect to the cluster. Using cluster mode or submit it from master node are much less painful.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM