简体   繁体   中英

Pyspark only writing '_temporary' folder when writing parquet

I am trying to write a pyspark df to parquet like this:

df.write.format("parquet").\
mode('overwrite').\
save('gs://my_bucket/my_folder/filename')

This data frame has rows in the millions but I have been able to write a similar data frame before in a few minutes. However, this takes 30+ minutes, and I can only see _temporary/0/ under it, with nothing else.

I am able to easily write a small data frame and see that it works, but for some reason this one does not. There doesn't appear to be anything wrong with the data frame.

Could there be any other reason besides an issue with the data frame as to why it is taking forever and nothing is being written? Other similarly-sized data frames have had no issues.

  • Your files won't appear until the spark job is completed
  • Once your job has completed successfully you will see the files
  • This is explained here Spark _temporary creation reason

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM