简体   繁体   中英

Create a timestamp column in Spark dataframe without milliseconds part

I am trying to create a new column called load_time_stamp in Pyspark's data frame which should only contain today's date and time till seconds and should not contain milliseconds.

I have written the below code for the same but with this, a new column is getting created with null values and not with the timestamp values which I expected.

from pyspark.sql import functions as F

x.withColumn("load_time_stamp", F.to_timestamp(F.substring(F.current_timestamp(), 0, 19), "yyyy-MM-dd'T'HH:mm:ss")).show()

You can use date_format instead:

import pyspark.sql.functions as F

x.withColumn("load_time_stamp", F.date_format(F.current_timestamp(), "yyyy-MM-dd'T'HH:mm:ss"))

Note that to_timestamp converts a timestamp from the given format, while date_format converts a timestamp to the given format. You do not need to substring the current timestamp because date_format will take care of that when formatting to the desired format.

If you want to get a timestamp type column with only seconds precision, then you can use from_unixtime function.

Example:

from pyspark.sql import functions as F

x = spark.createDataFrame([(1,)], ["id"])

x.withColumn(
    "load_time_stamp",
    F.from_unixtime(F.current_timestamp().cast("long"))
).show(truncate=False)

#+---+-------------------+
#|id |load_time_stamp    |
#+---+-------------------+
#|1  |2021-02-22 15:35:34|
#+---+-------------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM