简体   繁体   English

SPARK_EXECUTOR_INSTANCES在SPARK SHELL,纱线客户端模式下不起作用

[英]SPARK_EXECUTOR_INSTANCES not working in SPARK SHELL, YARN CLIENT MODE

I am new on spark. 我是新来的。

Trying to run spark on yarn in yarn-client mode . 尝试以spark on yarn in yarn-client mode

SPARK VERSION = 1.0.2 HADOOP VERSION = 2.2.0 SPARK VERSION = 1.0.2 HADOOP VERSION = 2.2.0

Cluster of yarn has 3 live nodes. 纱线簇具有3个活动节点。

Properties set in spark-env.sh 在spark-env.sh中设置的属性

SPARK_EXECUTOR_MEMORY=1G SPARK_EXECUTOR_MEMORY = 1G

SPARK_EXECUTOR_INSTANCES=3 SPARK_EXECUTOR_INSTANCES = 3

SPARK_EXECUTOR_CORES=1 SPARK_EXECUTOR_CORES = 1

SPARK_DRIVER_MEMORY=2G SPARK_DRIVER_MEMORY = 2G

Command used : /bin/spark-shell --master yarn-client 使用的命令:/ bin / spark-shell --master yarn-client

But after logging into spark-shell , it registers only 1 executor with some default mem assign to it. 但是登录到spark-shell ,它只注册了1个执行程序,并为其分配了一些默认的内存。

I confirmed it via spark-web UI as well that it only has 1 executor and that too on the master node ( YARN resource manager node ) only. 我也通过spark-web UI确认了它只有1个执行程序,并且仅在主节点( YARN resource manager node )上也是如此。

INFO yarn.Client: Command for starting the Spark ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx2048m, -Djava.io.tmpdir=$PWD/tmp, -Dspark.tachyonStore.folderName=\\"spark-fc6383cc-0904-4af9-8abd-3b66b3f0f461\\", -Dspark.yarn.secondary.jars=\\"\\", -Dspark.home=\\"/home/impadmin/spark-1.0.2-bin-hadoop2\\", -Dspark.repl.class.uri=\\" http://master_node:46823 \\", -Dspark.driver.host=\\"master_node\\", -Dspark.app.name=\\"Spark shell\\", -Dspark.jars=\\"\\", -Dspark.fileserver.uri=\\" http://master_node:46267 \\", -Dspark.master=\\"yarn-client\\", -Dspark.driver.port=\\"41209\\", -Dspark.httpBroadcast.uri=\\" http://master_node:36965 \\", -Dlog4j.configuration=log4j-spark-container.properties, org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused, --jar , null, --args 'master_node:41209' , --executor-memory, 1024, --executor-cores, 1, --num-executors , 3 , 1>, /stdout, 2>, /stderr) INFO yarn.Client:用于启动Spark Application的命令主:List($ JAVA_HOME / bin / java,-server,-Xmx2048m,-Djava.io.tmpdir = $ PWD / tmp,-Dspark.tachyonStore.folderName = \\“ spark- fc6383cc-0904-4af9-8abd-3b66b3f0f461 \\“,-Dspark.yarn.secondary.jars = \\” \\“,-Dspark.home = \\” / home / impadmin / spark-1.0.2-bin-hadoop2 \\“, -Dspark.repl.class.uri = \\“ http://master_node:46823 \\”,-Dspark.driver.host = \\“ master_node \\”,-Dspark.app.name = \\“ Spark shell \\”,-Dspark .jars = \\“ \\”,-Dspark.fileserver.uri = \\“ http://master_node:46267 \\”,-Dspark.master = \\“ yarn-client \\”,-Dspark.driver.port = \\“ 41209 \\“,-Dspark.httpBroadcast.uri = \\” http://master_node:36965 \\“,-Dlog4j.configuration = log4j-spark-container.properties,org.apache.spark.deploy.yarn.ExecutorLauncher,--class ,notused,--jar,null,--args'master_node:41209', -- executor -memory,1024,--executor-cores,1,--num-executors,3,1 >,/ stdout,2> ,/ stderr)

 ... ... ... 14/09/10 22:21:24 INFO cluster.YarnClientSchedulerBackend: Registered executor: 

Actor[akka.tcp://sparkExecutor@master_node:53619/user/Executor#1075999905] with ID 1 14/09/10 22:21:24 INFO storage.BlockManagerInfo: Registering block manager master_node:40205 with 589.2 MB RAM 14/09/10 22:21:25 INFO cluster.YarnClientClusterScheduler: YarnClientClusterScheduler.postStartHook done 14/09/10 22:21:25 INFO repl.SparkILoop: Created spark context.. Spark context available as sc. ID为1 14/09/10 22:21:24信息存储的Actor [akka.tcp:// sparkExecutor @ master_node:53619 / user / Executor#1075999905] INFO storage.BlockManagerInfo:注册块管理器master_node:40205,具有589.2 MB RAM 14 / 09/10 22:21:25 INFO cluster.YarnClientClusterScheduler:YarnClientClusterScheduler.postStartHook完成14/09/10 22:21:25 INFO repl.SparkILoop:创建了spark上下文。Spark上下文可以用作sc。

And after running any spark action with any amount of parallelization, it simply runs all those tasks in series on this node only!! 并且在以任何并行度运行任何spark动作之后,它仅在该节点上串联运行所有这些任务!

ok I solved it this way. 好吧,我这样解决了。 I have 4 data nodes on my cluster 我的集群上有4个数据节点

spark-shell --num-executors 4 --master yarn-client spark-shell --num-executors 4 --master yarn-client

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM