Spark High Availability

Question

I`m using spark 1.2.1 on three nodes that run three workers with slave configuration and run daily jobs by using:

./spark-1.2.1/sbin/start-all.sh

//crontab configuration:
./spark-1.2.1/bin/spark-submit --master spark://11.11.11.11:7077 --driver-class-path home/ubuntu/spark-cassandra-connector-java-assembly-1.2.1-FAT.jar --class "$class" "$jar"

I want to keep spark master and slave workers available at all times, and even if it fail I need it to be restarted like a service (like cassandra does).

Is there any way to do it?

EDIT:

I looked into start-all.sh script and it is only contains the setup for start-master.sh script and start-slaves.sh script. I tried to create a supervisor configuration file for it and only get the below errors:

11.11.11.11: ssh: connect to host 11.11.11.12 port 22: No route to host
11.11.11.13: org.apache.spark.deploy.worker.Worker running as process 14627. Stop it first.
11.11.11.11: ssh: connect to host 11.11.11.12 port 22: No route to host
11.11.11.12: ssh: connect to host 11.11.11.13 port 22: No route to host
11.11.11.11: org.apache.spark.deploy.worker.Worker running as process 14627. Stop it first.
11.11.11.12: ssh: connect to host 11.11.11.12 port 22: No route to host
11.11.11.13: ssh: connect to host 11.11.11.13 port 22: No route to host
11.11.11.11: org.apache.spark.deploy.worker.Worker running as process 14627. Stop it first.

Answer 1

有像 monit 和 supervisor（甚至 systemd）这样的工具可以监控和重启失败的进程。

Spark High Availability

Question

1 answers

solution1
1 ACCPTED 2016-03-15 11:27:42

Spark High Availability

Question

1 answers

solution1 1 ACCPTED 2016-03-15 11:27:42

solution1
1 ACCPTED 2016-03-15 11:27:42