Unable to save a CSV file using PySpark Dataframe on AWS EMR

Question

I want to save a CSV file with gzip compression. The code runs successfully but it is silently failing - ie I see no file present on the path provided.

I tried reading the file that is supposed to be saved successfully but 'No such file found' is what I am getting after running the command file -i <path_to_the_file> .

My code for writing the csv file is:

>>> df
DataFrame[id: int, name: string, alignment: string, gender: string, eyecolor: string, race: string, haircolor: string, publisher: string, skincolor: string, height: int, weight: int, _paseena_row_number_: bigint, _paseena_timestamp_: timestamp, _paseena_commit_id_: string]
>>> df.write.csv('check_csv_post_so.csv')
>>>

Now, when I check, there exists no file.

I would go with some dfs unknown methodology but the catch is, I have worked with spark on other machines and found no such issue.

I expect the file to be present or the code to fail and show errors.

Answer 1

I think file is stored on HDFS. Try to save file with file:// or s3:// . Or use hdfs dfs -ls to see if file is there.

Unable to save a CSV file using PySpark Dataframe on AWS EMR

Question

1 answers

solution1
1 2019-07-12 10:09:52

Unable to save a CSV file using PySpark Dataframe on AWS EMR

Question

1 answers

solution1 1 2019-07-12 10:09:52

solution1
1 2019-07-12 10:09:52