AWS EMR，EMR步骤中的python pyspark脚本

Question

I try to run a very simple pyspark script as a step in aws emr which looks like: 我尝试在aws emr中运行一个非常简单的pyspark脚本，如下所示：

from pyspark.sql import SparkSession
sc = SparkContext()
df = sc.read.csv("s3://folder1/file.csv",header=True,inferSchema=True)
dd=df.select(df)
write_to = "s3://spark-workflow-test/"
dd.write.csv(write_to, sep = ";", header = True)
sc.stop()

It reads some file from a folder, selects a column, and writes it to another file in a bucket. 它从文件夹中读取一些文件，选择一列，然后将其写入存储桶中的另一个文件。 For some reason it keeps failing and i cant figure out why. 由于某种原因，它不断失败，我不知道为什么。

This script works fine in local spark, but in an emr step it keeps failing and giving an exitCode=13. 该脚本在本地火花中工作正常，但在emr步骤中，它始终失败并给出exitCode = 13。 Is there are problem in the code, a spark configuration or do i need to do something in the console/emr infterface? 代码，spark配置是否有问题，或者我需要在控制台/ emr界面上做些什么？ I really have no clue about where to look for a solution. 我真的不知道在哪里寻找解决方案。

Answer 1

I think your error is the same then in this issue. 我认为您的错误与该问题相同。

Your spark context definition seems off. 您的spark上下文定义似乎已关闭。 Replace it with : 替换为：

sc = SparkSession.builder.getOrCreate()

AWS EMR，EMR步骤中的python pyspark脚本

问题描述

1 个解决方案

解决方案1
0 2019-07-31 13:12:05

AWS EMR，EMR步骤中的python pyspark脚本

问题描述

1 个解决方案

解决方案1 0 2019-07-31 13:12:05

解决方案1
0 2019-07-31 13:12:05