提交 PySpark 应用程序以在集群模式下触发 YARN

Question

I'm trying to test a big data platform that has been built for the team I work in. It has spark running on YARN.我正在尝试测试一个为我工作的团队构建的大数据平台。它在 YARN 上运行。

Is it possible to create PySpark apps and submit them on a YARN cluster ?是否可以创建 PySpark 应用程序并将它们提交到 YARN 集群上？

I'm able to submit an example SparkPi jar file successfully, it returns the output in the YARN stdout logs.我能够成功提交一个示例 SparkPi jar 文件，它在 YARN 标准输出日志中返回输出。

Here is my PySpark code that I'm trying to test;这是我要测试的 PySpark 代码；

from pyspark import SparkConf
from pyspark import SparkContext

HDFS_MASTER = 'hadoop-master'

conf = SparkConf()
conf.setMaster('yarn')
conf.setAppName('spark-test')
sc = SparkContext(conf=conf)

distFile = sc.textFile('hdfs://{0}:9000/tmp/test/test.csv'.format(HDFS_MASTER))

nonempty_lines = distFile.filter(lambda x: len(x) > 0)
print ('Nonempty lines', nonempty_lines.count())

The command I try at my CMD in the spark directory:我在 spark 目录中的 CMD 中尝试的命令：

bin\spark-submit --master yarn --deploy-mode cluster --driver-memory 4g
executor-memory 2g --executor-cores 1 examples\sparktest2.py 10

My script is called sparktest2.py in my examples directory in my spark directory.我的脚本在我的 spark 目录中的示例目录中被称为sparktest2.py 。

Logs (stderr):日志（stderr）：

 application from cluster with 3 NodeManagers
 17/03/22 15:18:39 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
 17/03/22 15:18:39 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
 17/03/22 15:18:39 INFO Client: Setting up container launch context for our AM
 17/03/22 15:18:39 ERROR SparkContext: Error initializing SparkContext.
 java.util.NoSuchElementException: key not found: SPARK_HOME
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:59)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:59)
at org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:1148)
at org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:1147)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.deploy.yarn.Client.findPySparkArchives(Client.scala:1147)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:829)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:236)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
17/03/22 15:18:39 INFO SparkUI: Stopped Spark web UI at http://10.0.9.24:42155
17/03/22 15:18:39 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
17/03/22 15:18:39 INFO YarnClientSchedulerBackend: Stopped
17/03/22 15:18:39 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/03/22 15:18:39 INFO MemoryStore: MemoryStore cleared
17/03/22 15:18:39 INFO BlockManager: BlockManager stopped
17/03/22 15:18:39 INFO BlockManagerMaster: BlockManagerMaster stopped
17/03/22 15:18:39 WARN MetricsSystem: Stopping a MetricsSystem that is not running
17/03/22 15:18:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/03/22 15:18:39 INFO SparkContext: Successfully stopped SparkContext
17/03/22 15:18:39 ERROR ApplicationMaster: User application exited with status 1
17/03/22 15:18:39 INFO ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
17/03/22 15:18:47 ERROR ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application.
17/03/22 15:18:47 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User application exited with status 1)
17/03/22 15:18:47 INFO ApplicationMaster: Deleting staging directory hdfs://hadoop-master.overlaynet:9000/user/ahmeds/.sparkStaging/application_1489001113497_0038
17/03/22 15:18:47 INFO ShutdownHookManager: Shutdown hook called
17/03/22 15:18:47 INFO ShutdownHookManager: Deleting directory /tmp/hadoop-root/nm-local-dir/usercache/ahmeds/appcache/application_1489001113497_0038/spark-1b4d971c-4448-4a5f-b917-3b6e2d31bb95

Errors from stdout:来自标准输出的错误：

Traceback (most recent call last):
File "sparktest2.py", line 16, in <module>
sc = SparkContext(conf=conf)
File "/tmp/hadoop-root/nm-local dir/usercache/ahmeds/appcache/application_1489001113497_0038/container_1489001113497_0038_02_000001/pyspark.zip/pyspark/context.py", line 115, in __init__
File "/tmp/hadoop-root/nm-local-dir/usercache/ahmeds/appcache/application_1489001113497_0038/container_1489001113497_0038_02_000001/pyspark.zip/pyspark/context.py", line 168, in _do_init
File "/tmp/hadoop-root/nm-local-dir/usercache/ahmeds/appcache/application_1489001113497_0038/container_1489001113497_0038_02_000001/pyspark.zip/pyspark/context.py", line 233, in _initialize_context
File "/tmp/hadoop-root/nm-local-dir/usercache/ahmeds/appcache/application_1489001113497_0038/container_1489001113497_0038_02_000001/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1401, in __call__
File "/tmp/hadoop-root/nm-local-dir/usercache/ahmeds/appcache/application_1489001113497_0038/container_1489001113497_0038_02_000001/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.util.NoSuchElementException: key not found: SPARK_HOME
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:59)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:59)
at org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:1148)
at org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:1147)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.deploy.yarn.Client.findPySparkArchives(Client.scala:1147)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:829)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:236)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)

It seems to be complaining about SPARK_HOME , which I have setup in my environmental variables.它似乎在抱怨SPARK_HOME ，我在我的环境变量中设置了它。

Any help is greatly appreciated任何帮助是极大的赞赏

Python version 3.5 Python 3.5 版
Spark Version 2.0.1星火版本 2.0.1
OS: Windows 7操作系统：Windows 7

Answer 1

The thing that made it work for me was adding the following at my cmd;使它对我有用的是在我的 cmd 中添加以下内容；

--conf spark.yarn.appMasterEnv.SPARK_HOME=/dev/null
--conf spark.executorEnv.SPARK_HOME=/dev/null
--files pythonscript.py

Answer 2

I also got the similar issue.我也遇到了类似的问题。 set "SPARK_HOME" in hadoop-env.sh and restart ResourceManager, NameNode, DataNode.在 hadoop-env.sh 中设置“SPARK_HOME”并重启 ResourceManager、NameNode、DataNode。 It should be fixed.它应该被修复。

Answer 3

./spark-submit --master yarn-cluster --queue default \
--num-executors 20 --executor-memory 1G --executor-cores 3 \
--driver-memory 1G \
--conf spark.yarn.appMasterEnv.SPARK_HOME=/dev/null \
--conf spark.executorEnv.SPARK_HOME=/dev/null \
--files  /home/user/script.py

提交 PySpark 应用程序以在集群模式下触发 YARN

问题描述

3 个解决方案

解决方案1
1 已采纳 2017-03-27 14:42:52

解决方案2
0 2017-03-27 14:26:53

解决方案3
0 2018-01-17 19:15:16

提交 PySpark 应用程序以在集群模式下触发 YARN

问题描述

3 个解决方案

解决方案1 1 已采纳 2017-03-27 14:42:52

解决方案2 0 2017-03-27 14:26:53

解决方案3 0 2018-01-17 19:15:16

解决方案1
1 已采纳 2017-03-27 14:42:52

解决方案2
0 2017-03-27 14:26:53

解决方案3
0 2018-01-17 19:15:16