如何：火花纱簇

Question

I have set up a hadoop cluster with 3 machines one master and 2 slave In the master i have installed spark 我已经建立了一个包含3台机器的hadoop集群，其中1台是主服务器，2台是从服务器

SPARK_HADOOP_VERSION=2.4.0 SPARK_YARN=true sbt/sbt clean assembly

Added HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop spark-env.sh 添加了HADOOP_CONF_DIR = / usr / local / hadoop / etc / hadoop spark-env.sh

 Then i ran SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.4.0.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop ./bin/spark-submit --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.4.0.jar

I checked localhost:8088 and i saw application SparkPi running.. 我检查了localhost：8088，发现应用程序SparkPi正在运行。

Is it just this or i should install spark in the 2 slave machines.. How can i get all the machine started? 仅仅是这样还是我应该在2台从属计算机中安装spark ..如何启动所有计算机？

Is there any help doc out there.. I feel like i am missing something.. 是否有任何帮助文档..我觉得我缺少一些东西..

In spark standalone more we start the master and worker ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://IP:PORT 在独立的Spark中，我们开始了master和worker ./bin/spark-class org.apache.spark.deploy.worker.Worker spark：// IP：PORT

i also wanted to know how to get more than one worked running in this case as well 在这种情况下，我也想知道如何让多个人运行

and i know we can can configure slaves in conf/slave but can anyone share an example 我知道我们可以在conf / slave中配置slave，但是任何人都可以分享一个例子

Please help i am stuck 请帮我卡住

Answer 1

Assuming you're using Spark 1.1.0, as it says in the documentation ( http://spark.apache.org/docs/1.1.0/submitting-applications.html#master-urls ), for the master parameter you can use values yarn-cluster or yarn-client . 假设您使用的是Spark 1.1.0，如文档（ http://spark.apache.org/docs/1.1.0/submitting-applications.html#master-urls ）中所述，您可以使用master参数使用值yarn-cluster或yarn-client 。 You do not need to use deploy-mode parameter in that case. 在这种情况下，您不需要使用deploy-mode参数。

You do not have to install Spark on all the YARN nodes. 您不必在所有YARN节点上都安装Spark。 That is what YARN is for: to distribute your application (in this case Spark) over a Hadoop cluster. 这就是YARN的目的：在Hadoop集群上分发您的应用程序（在本例中为Spark）。

如何：火花纱簇

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-10-24 11:32:24

如何：火花纱簇

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-10-24 11:32:24

解决方案1
1 已采纳 2014-10-24 11:32:24