简体   繁体   中英

How to make spark streaming job run perpetually on HD Insights (YARN)?

I am developing a spark application running in HD Insights Cluster (YARN based) with IntelliJ. Currently, I submit jobs through the Azure HD Insights plug-in directly from IntelliJ. This, in turns, use the Livy API to submit the job remotely.

When I am done with developing the code, I would like the streaming job to be run perpetually. Currently, if the job fails five times, the program stops and doesn't restart itself. Is there any way to change this behavior? Or what solution do most people use to make spark restart after failing?

Restart of Yarn Spark jobs is controlled by Yarn settings. So you need to increase number of restarts for the spark application (yarn application master) in yarn. I believe it's: yarn.resourcemanager.am.max-attempts . In HDInsight go to Ambari UI and change this setting in Yarn -> Config -> Advanced Yarn-site.

In order to submit production job you can use livy APIs directly as described here: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-eventhub-streaming#run-the-application-remotely-on-a-spark-cluster-using-livy

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM