简体   繁体   English

作业完成后,spark-submit 继续挂起

[英]spark-submit continues to hang after job completion

I am trying to test spark 1.6 with hdfs in AWS.我正在尝试在 AWS 中使用 hdfs 测试 spark 1.6。 I am using the wordcount python example available in the examples folder.我正在使用示例文件夹中提供的 wordcount python 示例。 I submit the job with spark-submit, the job completes successfully and its prints the results on the console as well.我使用 spark-submit 提交作业,作业成功完成,并在控制台上打印结果。 The web-UI also says its completed.网络用户界面还说它已完成。 However the spark-submit never terminates.然而,火花提交永远不会终止。 I have verified that the context is stopped in the word count example code as well.我已经验证在字数示例代码中也停止了上下文。

What could be wrong ?有什么问题?

This is what I see on the console.这是我在控制台上看到的。

6-05-24 14:58:04,749 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
2016-05-24 14:58:04,749 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/stages/json,null}
2016-05-24 14:58:04,749 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/stages,null}
2016-05-24 14:58:04,749 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
2016-05-24 14:58:04,750 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
2016-05-24 14:58:04,750 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
2016-05-24 14:58:04,750 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/jobs,null}
2016-05-24 14:58:04,802 INFO  [Thread-3] ui.SparkUI (Logging.scala:logInfo(58)) - Stopped Spark web UI at http://172.30.2.239:4040
2016-05-24 14:58:04,805 INFO  [Thread-3] cluster.SparkDeploySchedulerBackend (Logging.scala:logInfo(58)) - Shutting down all executors
2016-05-24 14:58:04,805 INFO  [dispatcher-event-loop-2] cluster.SparkDeploySchedulerBackend (Logging.scala:logInfo(58)) - Asking each executor to shut down
2016-05-24 14:58:04,814 INFO  [dispatcher-event-loop-5] spark.MapOutputTrackerMasterEndpoint (Logging.scala:logInfo(58)) - MapOutputTrackerMasterEndpoint stopped!
2016-05-24 14:58:04,818 INFO  [Thread-3] storage.MemoryStore (Logging.scala:logInfo(58)) - MemoryStore cleared
2016-05-24 14:58:04,818 INFO  [Thread-3] storage.BlockManager (Logging.scala:logInfo(58)) - BlockManager stopped
2016-05-24 14:58:04,820 INFO  [Thread-3] storage.BlockManagerMaster (Logging.scala:logInfo(58)) - BlockManagerMaster stopped
2016-05-24 14:58:04,821 INFO  [dispatcher-event-loop-3] scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint (Logging.scala:logInfo(58)) - OutputCommitCoordinator stopped!
2016-05-24 14:58:04,824 INFO  [Thread-3] spark.SparkContext (Logging.scala:logInfo(58)) - Successfully stopped SparkContext
2016-05-24 14:58:04,827 INFO  [sparkDriverActorSystem-akka.actor.default-dispatcher-2] remote.RemoteActorRefProvider$RemotingTerminator (Slf4jLogger.scala:apply$mcV$sp(74)) - Shutting down remote daemon.
2016-05-24 14:58:04,828 INFO  [sparkDriverActorSystem-akka.actor.default-dispatcher-2] remote.RemoteActorRefProvider$RemotingTerminator (Slf4jLogger.scala:apply$mcV$sp(74)) - Remote daemon shut down; proceeding with flushing remote transports.
2016-05-24 14:58:04,843 INFO  [sparkDriverActorSystem-akka.actor.default-dispatcher-2] remote.RemoteActorRefProvider$RemotingTerminator (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting shut down.

I have to do a ctrl-c to terminate the spark-submit process.我必须按 ctrl-c 来终止 spark-submit 过程。 This is really a weird problem and I have no idea how to fix this.这真的是一个奇怪的问题,我不知道如何解决这个问题。 Please let me know if there are any logs I should be looking at, or doing things differently here.请让我知道是否有我应该查看的日志,或者在这里做不同的事情。

Here is the pastebin link of the jstack output of spark-submit process: http://pastebin.com/Nfnt4XmT这是 spark-submit 过程的 jstack 输出的 pastebin 链接: http : //pastebin.com/Nfnt4XmT

I had the same issue with custom thread pool in my spark job code.我的 spark 作业代码中的自定义线程池也有同样的问题。 I found out that spark-submit hangs with using custom non daemonic thread pools in your code.我发现在您的代码中使用自定义守护线程池时, spark-submit 会挂起。 You could check ThreadUtils.newDaemonCachedThreadPool() to understand how spark developers create thread pools or you could use this utils but be careful as they are package private.您可以检查ThreadUtils.newDaemonCachedThreadPool()以了解 spark 开发人员如何创建线程池,或者您可以使用此实用程序但要小心,因为它们是包私有的。

For me: On local it stops but keeps sending logs when executed on Cluster.对我来说:在本地它停止但在集群上执行时继续发送日志。 I think it is expected behaviour.我认为这是预期的行为。 Got this from another answer and worked for me.从另一个答案中得到这个并为我工作。

val sc=new SparkContext(conf)
try{
      //code goes here
}
finally{
      sc.stop()
}

This technique worked both in Spark and PySpark.这种技术在 Spark 和 PySpark 中都有效。 sample shot样品拍摄

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM