简体   繁体   中英

how to : spark yarn cluster

I have set up a hadoop cluster with 3 machines one master and 2 slave In the master i have installed spark

SPARK_HADOOP_VERSION=2.4.0 SPARK_YARN=true sbt/sbt clean assembly

Added HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop spark-env.sh

 Then i ran SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.4.0.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop ./bin/spark-submit --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.4.0.jar

I checked localhost:8088 and i saw application SparkPi running..

Is it just this or i should install spark in the 2 slave machines.. How can i get all the machine started?

Is there any help doc out there.. I feel like i am missing something..

In spark standalone more we start the master and worker ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://IP:PORT

i also wanted to know how to get more than one worked running in this case as well

and i know we can can configure slaves in conf/slave but can anyone share an example

Please help i am stuck

Assuming you're using Spark 1.1.0, as it says in the documentation ( http://spark.apache.org/docs/1.1.0/submitting-applications.html#master-urls ), for the master parameter you can use values yarn-cluster or yarn-client . You do not need to use deploy-mode parameter in that case.

You do not have to install Spark on all the YARN nodes. That is what YARN is for: to distribute your application (in this case Spark) over a Hadoop cluster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM