[英]AWS EMR, python pyspark script in EMR steps
I try to run a very simple pyspark script as a step in aws emr which looks like: 我尝试在aws emr中运行一个非常简单的pyspark脚本,如下所示:
from pyspark.sql import SparkSession
sc = SparkContext()
df = sc.read.csv("s3://folder1/file.csv",header=True,inferSchema=True)
dd=df.select(df)
write_to = "s3://spark-workflow-test/"
dd.write.csv(write_to, sep = ";", header = True)
sc.stop()
It reads some file from a folder, selects a column, and writes it to another file in a bucket. 它从文件夹中读取一些文件,选择一列,然后将其写入存储桶中的另一个文件。 For some reason it keeps failing and i cant figure out why.
由于某种原因,它不断失败,我不知道为什么。
This script works fine in local spark, but in an emr step it keeps failing and giving an exitCode=13. 该脚本在本地火花中工作正常,但在emr步骤中,它始终失败并给出exitCode = 13。 Is there are problem in the code, a spark configuration or do i need to do something in the console/emr infterface?
代码,spark配置是否有问题,或者我需要在控制台/ emr界面上做些什么? I really have no clue about where to look for a solution.
我真的不知道在哪里寻找解决方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.