How to specify a name to CSV file that I save to S3 with Scala

Question

When I'm trying to save a DataFrame as CSV to S3, the file is created with a name that is generated by Scala. For example -

  file.coalesce(1).write.option("header", "true").csv(bucket + "/fileName.csv")

Creates a directory called fileName.csv within the bucket with a file called part-00000-955faf13-9fc3-4ccc-b0df-fb91cd701901-c000.csv

How can I change the file's name or save it with a specific name?

Answer 1

Spark's write method can't directly control the name of the file that's written. It can only control the name of the directory, but not the file itself. But it's possible to change the filename after processing:

import org.apache.hadoop.fs._
FileSystem.get(sc.hadoopConfiguration).rename(
new Path("dir/oldName.csv/part-0000"), 
new Path("dir/newName.csv"))

Answer 2

That's what worked for me eventually after the file has been saved -

val src = new Path(s"s3a://$bucketName/$pathToDir")
val fs = src.getFileSystem(sc.hadoopConfiguration)
val status = fs.listStatus(src)
status.foreach(filename => {
  fs.rename(new Path(s"s3a://$bucketName/$pathToDir/${filename.getPath.getName}"),
    new Path(s"s3a://$bucketName/$pathToDir/$newFileName"))
})

How to specify a name to CSV file that I save to S3 with Scala

Question

2 answers

solution1
0 2022-01-12 16:09:11

solution2
0 2022-01-12 19:27:30

How to specify a name to CSV file that I save to S3 with Scala

Question

2 answers

solution1 0 2022-01-12 16:09:11

solution2 0 2022-01-12 19:27:30

solution1
0 2022-01-12 16:09:11

solution2
0 2022-01-12 19:27:30