Pyspark only writing '_temporary' folder when writing parquet

Question

I am trying to write a pyspark df to parquet like this:

df.write.format("parquet").\
mode('overwrite').\
save('gs://my_bucket/my_folder/filename')

This data frame has rows in the millions but I have been able to write a similar data frame before in a few minutes. However, this takes 30+ minutes, and I can only see _temporary/0/ under it, with nothing else.

I am able to easily write a small data frame and see that it works, but for some reason this one does not. There doesn't appear to be anything wrong with the data frame.

Could there be any other reason besides an issue with the data frame as to why it is taking forever and nothing is being written? Other similarly-sized data frames have had no issues.

Answer 1

Your files won't appear until the spark job is completed
Once your job has completed successfully you will see the files
This is explained here Spark _temporary creation reason

Pyspark only writing '_temporary' folder when writing parquet

Question

1 answers

solution1
0 2021-12-07 18:19:58

Pyspark only writing '_temporary' folder when writing parquet

Question

1 answers

solution1 0 2021-12-07 18:19:58

solution1
0 2021-12-07 18:19:58