Create a timestamp column in Spark dataframe without milliseconds part

Question

I am trying to create a new column called load_time_stamp in Pyspark's data frame which should only contain today's date and time till seconds and should not contain milliseconds.

I have written the below code for the same but with this, a new column is getting created with null values and not with the timestamp values which I expected.

from pyspark.sql import functions as F

x.withColumn("load_time_stamp", F.to_timestamp(F.substring(F.current_timestamp(), 0, 19), "yyyy-MM-dd'T'HH:mm:ss")).show()

Answer 1

You can use date_format instead:

import pyspark.sql.functions as F

x.withColumn("load_time_stamp", F.date_format(F.current_timestamp(), "yyyy-MM-dd'T'HH:mm:ss"))

Note that to_timestamp converts a timestamp from the given format, while date_format converts a timestamp to the given format. You do not need to substring the current timestamp because date_format will take care of that when formatting to the desired format.

Answer 2

If you want to get a timestamp type column with only seconds precision, then you can use from_unixtime function.

Example:

from pyspark.sql import functions as F

x = spark.createDataFrame([(1,)], ["id"])

x.withColumn(
    "load_time_stamp",
    F.from_unixtime(F.current_timestamp().cast("long"))
).show(truncate=False)

#+---+-------------------+
#|id |load_time_stamp    |
#+---+-------------------+
#|1  |2021-02-22 15:35:34|
#+---+-------------------+

Create a timestamp column in Spark dataframe without milliseconds part

Question

2 answers

solution1
0 2021-02-22 12:20:16

solution2
0 2021-02-22 14:40:48

Create a timestamp column in Spark dataframe without milliseconds part

Question

2 answers

solution1 0 2021-02-22 12:20:16

solution2 0 2021-02-22 14:40:48

solution1
0 2021-02-22 12:20:16

solution2
0 2021-02-22 14:40:48