简体   繁体   中英

spark-submit continues to hang after job completion

I am trying to test spark 1.6 with hdfs in AWS. I am using the wordcount python example available in the examples folder. I submit the job with spark-submit, the job completes successfully and its prints the results on the console as well. The web-UI also says its completed. However the spark-submit never terminates. I have verified that the context is stopped in the word count example code as well.

What could be wrong ?

This is what I see on the console.

6-05-24 14:58:04,749 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
2016-05-24 14:58:04,749 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/stages/json,null}
2016-05-24 14:58:04,749 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/stages,null}
2016-05-24 14:58:04,749 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
2016-05-24 14:58:04,750 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
2016-05-24 14:58:04,750 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
2016-05-24 14:58:04,750 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/jobs,null}
2016-05-24 14:58:04,802 INFO  [Thread-3] ui.SparkUI (Logging.scala:logInfo(58)) - Stopped Spark web UI at http://172.30.2.239:4040
2016-05-24 14:58:04,805 INFO  [Thread-3] cluster.SparkDeploySchedulerBackend (Logging.scala:logInfo(58)) - Shutting down all executors
2016-05-24 14:58:04,805 INFO  [dispatcher-event-loop-2] cluster.SparkDeploySchedulerBackend (Logging.scala:logInfo(58)) - Asking each executor to shut down
2016-05-24 14:58:04,814 INFO  [dispatcher-event-loop-5] spark.MapOutputTrackerMasterEndpoint (Logging.scala:logInfo(58)) - MapOutputTrackerMasterEndpoint stopped!
2016-05-24 14:58:04,818 INFO  [Thread-3] storage.MemoryStore (Logging.scala:logInfo(58)) - MemoryStore cleared
2016-05-24 14:58:04,818 INFO  [Thread-3] storage.BlockManager (Logging.scala:logInfo(58)) - BlockManager stopped
2016-05-24 14:58:04,820 INFO  [Thread-3] storage.BlockManagerMaster (Logging.scala:logInfo(58)) - BlockManagerMaster stopped
2016-05-24 14:58:04,821 INFO  [dispatcher-event-loop-3] scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint (Logging.scala:logInfo(58)) - OutputCommitCoordinator stopped!
2016-05-24 14:58:04,824 INFO  [Thread-3] spark.SparkContext (Logging.scala:logInfo(58)) - Successfully stopped SparkContext
2016-05-24 14:58:04,827 INFO  [sparkDriverActorSystem-akka.actor.default-dispatcher-2] remote.RemoteActorRefProvider$RemotingTerminator (Slf4jLogger.scala:apply$mcV$sp(74)) - Shutting down remote daemon.
2016-05-24 14:58:04,828 INFO  [sparkDriverActorSystem-akka.actor.default-dispatcher-2] remote.RemoteActorRefProvider$RemotingTerminator (Slf4jLogger.scala:apply$mcV$sp(74)) - Remote daemon shut down; proceeding with flushing remote transports.
2016-05-24 14:58:04,843 INFO  [sparkDriverActorSystem-akka.actor.default-dispatcher-2] remote.RemoteActorRefProvider$RemotingTerminator (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting shut down.

I have to do a ctrl-c to terminate the spark-submit process. This is really a weird problem and I have no idea how to fix this. Please let me know if there are any logs I should be looking at, or doing things differently here.

Here is the pastebin link of the jstack output of spark-submit process: http://pastebin.com/Nfnt4XmT

I had the same issue with custom thread pool in my spark job code. I found out that spark-submit hangs with using custom non daemonic thread pools in your code. You could check ThreadUtils.newDaemonCachedThreadPool() to understand how spark developers create thread pools or you could use this utils but be careful as they are package private.

For me: On local it stops but keeps sending logs when executed on Cluster. I think it is expected behaviour. Got this from another answer and worked for me.

val sc=new SparkContext(conf)
try{
      //code goes here
}
finally{
      sc.stop()
}

This technique worked both in Spark and PySpark. sample shot

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM