pyspark将csv文件写入S3错误

Question

I am using pyspark and I am having trouble writing to S3, but reading from S3 is not a problem. 我正在使用pyspark，但无法写入S3，但从S3读取不是问题。

this is my code: 这是我的代码：

dic = {'a': {'c1(%)': 0.0, 'c2': 0, 'c3($)': 260, 'c4(%)': 4.79, 'c5': 78, 'c6': 352}, 'b': {'c1(%)': 0.0, 'c2': 0, 'c3($)': 5, 'c4(%)': 0.09, 'c5': 2, 'c6': 280}, 'c': {'c1(%)': 0.0, 'c2': 0, 'c3($)': 0, 'c4(%)': 0.0, 'c5': 0, 'c6': 267}}

df = pd.DataFrame(dic)

df.to_csv("s3://work/.../filename_2018-01-04_08:50:45.csv")

this is the error: 这是错误：

IOError: [Errno 2] No such file or directory: 's3://work/.../filename_2018-01-04_08:50:45.csv'

what is the problem? 问题是什么？

Answer 1

See my comment above, you need to use a Spark DataFrame. 参见上面的评论，您需要使用Spark DataFrame。 One easy way to accomplish this would be to turn the index on the Pandas DF into a column and then convert to spark DF: 一种简单的方法是将Pandas DF上的索引转换为一列，然后转换为spark DF：

df2=sqlContext.createDataFrame(df.reset_index(drop=False))

Then use: 然后使用：

df2.write.save("s3://work/.../filename_2018-01-04_08:50:45.csv", format='csv', header=True)

pyspark将csv文件写入S3错误

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-01-04 10:16:08

pyspark将csv文件写入S3错误

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-01-04 10:16:08

解决方案1
3 已采纳 2018-01-04 10:16:08