How to write in CSV file without creating folder in pyspark?

Question

While writing in CSV file, automatically folder is created and then csv file with cryptic name is created, how to create this CSV with any specific name but without creating folder in pyspark not in pandas.

Answer 1

That's just the way Spark works with the parallelizing mechanism. Spark application meant to have one or more workers to read your data and to write into a location. When you write a CSV file, having a directory with multiple files is the way multiple workers can write at the same time.

If you're using HDFS, you can consider writing another bash script to move or reorganize files the way you want

If you're using Databricks, you can use dbutils.ls to interact with DBFS files in the same way.

Answer 2

This is the way spark is designed to write out multiple files in parallel. Writing out many files at the same time is faster for big datasets. But still you can achieve by using of coalesce(1,true).saveAsTextFile() .You can refer here

Answer 3

In PySpark, the following code helped me to directly write data into CSV file

df.toPandas().to_csv('FileName.csv')

How to write in CSV file without creating folder in pyspark?

Question

3 answers

solution1
0 2021-11-09 21:16:39

solution2
0 2021-11-10 16:03:13

solution3
0 2023-01-12 07:32:08

How to write in CSV file without creating folder in pyspark?

Question

3 answers

solution1 0 2021-11-09 21:16:39

solution2 0 2021-11-10 16:03:13

solution3 0 2023-01-12 07:32:08

solution1
0 2021-11-09 21:16:39

solution2
0 2021-11-10 16:03:13

solution3
0 2023-01-12 07:32:08