简体   繁体   中英

Failure recovery in spark running on HDinsight

I was trying to get Apache spark run on Azure HDinsight by following the steps from http://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-spark-install/

I was wondering if I have to manage the master/slave failure recovery myself, or will HDinsight take care of it.

I'm also working on Spark Streaming applications on Azure HDInsight. Inside the Spark job, Spark and Yarn can provide some Fault-Tolerance for Master and Slave.

  1. But sometimes, the driver and worker will also crash by the user-code error, spark internal issues, and Azure HDInsight issues. So, we need to make our own monitoring/daemon process , and maintain the recovery .
  2. For Streaming Scenarios, it's even harder. As the Spark Streaming Job which need keep 7*24 running, there's the concern that how to keep the job recovery for the machine reboot and reimage .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM