无法在 AWS EMR 上使用 PySpark Dataframe 保存 CSV 文件

Question

I want to save a CSV file with gzip compression.我想用 gzip 压缩保存一个 CSV 文件。 The code runs successfully but it is silently failing - ie I see no file present on the path provided.代码运行成功，但无声无息地失败了 -即我在提供的路径上看不到文件。

I tried reading the file that is supposed to be saved successfully but 'No such file found' is what I am getting after running the command file -i <path_to_the_file> .我尝试读取应该成功保存的文件，但是在运行命令file -i <path_to_the_file>后我得到的是“找不到这样的文件”。

My code for writing the csv file is:我编写 csv 文件的代码是：

>>> df
DataFrame[id: int, name: string, alignment: string, gender: string, eyecolor: string, race: string, haircolor: string, publisher: string, skincolor: string, height: int, weight: int, _paseena_row_number_: bigint, _paseena_timestamp_: timestamp, _paseena_commit_id_: string]
>>> df.write.csv('check_csv_post_so.csv')
>>>

Now, when I check, there exists no file.现在，当我检查时，不存在文件。

I would go with some dfs unknown methodology but the catch is, I have worked with spark on other machines and found no such issue.我会使用一些 dfs 未知的方法，但问题是，我在其他机器上使用过 spark 并没有发现这样的问题。

I expect the file to be present or the code to fail and show errors.我希望文件存在或代码失败并显示错误。

Answer 1

I think file is stored on HDFS.我认为文件存储在 HDFS 上。 Try to save file with file:// or s3:// .尝试使用file://或s3://保存文件。 Or use hdfs dfs -ls to see if file is there.或者使用hdfs dfs -ls查看文件是否存在。

无法在 AWS EMR 上使用 PySpark Dataframe 保存 CSV 文件

问题描述

1 个解决方案

解决方案1
1 2019-07-12 10:09:52

无法在 AWS EMR 上使用 PySpark Dataframe 保存 CSV 文件

问题描述

1 个解决方案

解决方案1 1 2019-07-12 10:09:52

解决方案1
1 2019-07-12 10:09:52