简体   繁体   中英

Create a csv file with timestamp as file name using a dataframe scala

I have a dataframe with data as follows.

+---------------+-------+
|category       |marks  |
+---------------+-------+
|cricket        |1.0    |
|tennis         |1.0    |
|football       |2.0    |
+---------------+-------+

I want to write the above dataframe into a csv file where file name will be created with current timestamp.

generatedDataFrame.write.mode ("append")
    .format("com.databricks.spark.csv").option("delimiter", ";").save("./src/main/resources-"+LocalDateTime.now()+".csv")

But this code is not working properly. Giving the following error

java.io.IOException: Mkdirs failed to create file

Is there a better way to achieve this using scala and spark? Also even though I am trying to create the file with timestamp code is creating a directory with the timestamp and inside that directory a csv with data is created with a random name. how can I have the timestamp filename to these csv files instead of creating a directory?

DF.write.csv will always create a folder with the name you specified and places the output csv files in that folder.

If you want single csv file as a output with the name as timestamp then you can use below code:

import java.text.SimpleDateFormat
import java.util.Date
import org.apache.spark.sql._
import org.apache.hadoop.fs.{FileSystem, Path}

val spark = SparkSession.builder().master("local[*]").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")

val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)

generatedDataFrame.coalesce(1).write.mode("append").csv("./src/main/resources/outputcsv/")

val outFileName = fs.globStatus(new Path("./src/main/resources/outputcsv/part*"))(0).getPath.getName

val timestamp = new SimpleDateFormat("yyyyMMddHHmm").format(new Date())

fs.rename(new Path(s"./src/main/resources/outputcsv/$outFileName"), new Path(s"./src/main/resources/outputcsv/${timestamp}.csv"))

You should be using src/main/resources and not./src/main/resources. You can check the permissions for directory creation from command line. Also, using LocalDateTime.now directly in path will look something like this "2021-03-01T13:39:09.646", not sure if this is what you want or even if it is valid for HDFS paths(chars like [:]), so would suggest to use date-formatting as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM