简体   繁体   English

HDinsight上运行的Spark中的故障恢复

[英]Failure recovery in spark running on HDinsight

I was trying to get Apache spark run on Azure HDinsight by following the steps from http://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-spark-install/ 我正尝试按照http://azure.microsoft.com/zh-cn/documentation/articles/hdinsight-hadoop-spark-install/中的步骤在Azure HDinsight上运行Apache Spark。

I was wondering if I have to manage the master/slave failure recovery myself, or will HDinsight take care of it. 我想知道是否必须自己管理主/从故障恢复,否则HDinsight会照顾它。

I'm also working on Spark Streaming applications on Azure HDInsight. 我还在Azure HDInsight上的Spark Streaming应用程序上工作。 Inside the Spark job, Spark and Yarn can provide some Fault-Tolerance for Master and Slave. 在Spark作业中,Spark和Yarn可以为Master和Slave提供一些容错功能。

  1. But sometimes, the driver and worker will also crash by the user-code error, spark internal issues, and Azure HDInsight issues. 但是有时,驱动程序和工作程序也会由于用户代码错误,引发内部问题以及Azure HDInsight问题而崩溃。 So, we need to make our own monitoring/daemon process , and maintain the recovery . 因此,我们需要进行自己的监视/守护进程 ,并保持恢复
  2. For Streaming Scenarios, it's even harder. 对于方案,它甚至更难。 As the Spark Streaming Job which need keep 7*24 running, there's the concern that how to keep the job recovery for the machine reboot and reimage . 作为需要保持7 * 24运行的Spark Streaming Job,存在着一种担忧,即如何在计算机重新启动和重新映像时保持作业恢复。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM