简体   繁体   中英

How to specify a name to CSV file that I save to S3 with Scala

When I'm trying to save a DataFrame as CSV to S3, the file is created with a name that is generated by Scala. For example -

  file.coalesce(1).write.option("header", "true").csv(bucket + "/fileName.csv")

Creates a directory called fileName.csv within the bucket with a file called part-00000-955faf13-9fc3-4ccc-b0df-fb91cd701901-c000.csv

How can I change the file's name or save it with a specific name?

Spark's write method can't directly control the name of the file that's written. It can only control the name of the directory, but not the file itself. But it's possible to change the filename after processing:

import org.apache.hadoop.fs._
FileSystem.get(sc.hadoopConfiguration).rename(
new Path("dir/oldName.csv/part-0000"), 
new Path("dir/newName.csv"))

That's what worked for me eventually after the file has been saved -

val src = new Path(s"s3a://$bucketName/$pathToDir")
val fs = src.getFileSystem(sc.hadoopConfiguration)
val status = fs.listStatus(src)
status.foreach(filename => {
  fs.rename(new Path(s"s3a://$bucketName/$pathToDir/${filename.getPath.getName}"),
    new Path(s"s3a://$bucketName/$pathToDir/$newFileName"))
})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM