简体   繁体   中英

How to write in CSV file without creating folder in pyspark?

While writing in CSV file, automatically folder is created and then csv file with cryptic name is created, how to create this CSV with any specific name but without creating folder in pyspark not in pandas.

That's just the way Spark works with the parallelizing mechanism. Spark application meant to have one or more workers to read your data and to write into a location. When you write a CSV file, having a directory with multiple files is the way multiple workers can write at the same time.

If you're using HDFS, you can consider writing another bash script to move or reorganize files the way you want

If you're using Databricks, you can use dbutils.ls to interact with DBFS files in the same way.

This is the way spark is designed to write out multiple files in parallel. Writing out many files at the same time is faster for big datasets. But still you can achieve by using of coalesce(1,true).saveAsTextFile() .You can refer here

In PySpark, the following code helped me to directly write data into CSV file

df.toPandas().to_csv('FileName.csv')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM