繁体   English   中英

在aws emr上激活提交python应用程序的正确方法是什么?

[英]What is the correct method to spark-submit python applications on aws emr?

我已经连接到一个Spark集群的主节点,在emr中运行,我正在尝试提交一个基于python的应用程序:

spark-submit --verbose --deploy-mode cluster --master yarn-cluster --num-executors 3 --executor-cores 6 --executor-memory 1g test.py 

该过程会生成一组日志转储,包括以下对集群的部署确认:

6/08/29 20:47:51 INFO Client: Uploading resource file:/home/hadoop/test.py -> hdfs://ip-xxx-xxx-xxx-xxx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1472396426409_0007/test.py
16/08/29 20:47:51 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> hdfs://ip-xxx-xxx-xxx-xxx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1472396426409_0007/pyspark.zip
16/08/29 20:47:51 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/py4j-0.10.1-src.zip -> hdfs://ip-xxx-xxx-xxx-xxx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1472396426409_0007/py4j-0.10.1-src.zip

然而,应用程序无法运行,报告缺少py4j库?

6/08/29 20:48:47 INFO Client: Application report for application_1472396426409_0007 (state: ACCEPTED)
16/08/29 20:48:48 INFO Client: Application report for application_1472396426409_0007 (state: FAILED)
16/08/29 20:48:48 INFO Client: 
     client token: N/A
     diagnostics: Application application_1472396426409_0007 failed 2 times due to AM Container for appattempt_1472396426409_0007_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://ip-xxx-xxx-xxx-xxx.ec2.internal:8088/cluster/app/application_1472396426409_0007Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://ip-xxx-xxx-xxx-xxx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1472396426409_0007/py4j-0.10.1-src.zip
java.io.FileNotFoundException: File does not exist: hdfs://ip-xxx-xxx-xxx-xxx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1472396426409_0007/py4j-0.10.1-src.zip
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)

我滥用命令还是什么?

这似乎是aws系统的一个错误。 纱线监控系统并注意到已部署的代码不再存在 - 这实际上表明火花已完成处理。

要验证这是否是问题,请通过读取应用程序的日志进行仔细检查 - 即,对主节点运行类似的操作:

yarn logs -applicationId application_1472396426409_0007

并仔细检查您是否在日志中看到成功消息:

INFO ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM