简体   繁体   English

如何自动重启Spark Streaming中发生故障的节点?

[英]How to automatically restart a failed node in Spark Streaming?

I'm using Spark on a cluster in Standalone mode. 我在独立模式下的群集上使用Spark。

I'm currently working on a Spark Streaming application. 我目前正在开发Spark Streaming应用程序。 I've added checkpoints for the system in order to deal with the master process suddenly failing and I see that it's working well. 我已经为系统添加了检查点,以便处理突然失败的主进程,并且我看到它运行良好。

My question is: what happens if the entire node crashes (power failure, hardware error etc), is there a way to automatically identify failing nodes in the cluster and if so restart them on the same machine (or restart them on a different machine instead) 我的问题是:如果整个节点崩溃(电源故障,硬件错误等),会发生什么情况,有没有一种方法可以自动识别集群中的故障节点,如果可以的话,可以在同一台计算机上重新启动它们(或者在另一台计算机上重新启动它们) )

I've looked at monit but it seems to be running on a specific machine and restart failing processes while I need to do the same thing but over nodes. 我看过monit,但它似乎正在特定的计算机上运行,​​并重新启动失败的进程,而我需要在节点上执行相同的操作。 Just to be clear, I don't mind if the restart operation will take a little bit of time but I would prefer it to happen automatically 请注意,我不介意重启操作是否需要花费一些时间,但我希望它可以自动进行

Is there any way to do this? 有什么办法吗?

Thanks in advance 提前致谢

Spark Standalone has some support for High-Availability, as described in the official documentation , at least for the master node. 官方文档中所述 Spark Standalone至少对主节点具有对高可用性的支持。

When a worker node dies, Spark will schedule jobs on other nodes, which works more or less with Spark Streaming as well. 当工作节点死亡时,Spark将在其他节点上调度作业,这在Spark Streaming中也或多或少地起作用。

Other than that, you need some cluster management and monitoring tools. 除此之外,您还需要一些集群管理和监视工具。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM