I am trying to create a new column called load_time_stamp
in Pyspark's data frame which should only contain today's date and time till seconds and should not contain milliseconds.
I have written the below code for the same but with this, a new column is getting created with null values and not with the timestamp values which I expected.
from pyspark.sql import functions as F
x.withColumn("load_time_stamp", F.to_timestamp(F.substring(F.current_timestamp(), 0, 19), "yyyy-MM-dd'T'HH:mm:ss")).show()
You can use date_format
instead:
import pyspark.sql.functions as F
x.withColumn("load_time_stamp", F.date_format(F.current_timestamp(), "yyyy-MM-dd'T'HH:mm:ss"))
Note that to_timestamp
converts a timestamp from the given format, while date_format
converts a timestamp to the given format. You do not need to substring
the current timestamp because date_format
will take care of that when formatting to the desired format.
If you want to get a timestamp type column with only seconds precision, then you can use from_unixtime
function.
Example:
from pyspark.sql import functions as F
x = spark.createDataFrame([(1,)], ["id"])
x.withColumn(
"load_time_stamp",
F.from_unixtime(F.current_timestamp().cast("long"))
).show(truncate=False)
#+---+-------------------+
#|id |load_time_stamp |
#+---+-------------------+
#|1 |2021-02-22 15:35:34|
#+---+-------------------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.