在 Spark dataframe 中创建没有毫秒部分的时间戳列

Question

I am trying to create a new column called load_time_stamp in Pyspark's data frame which should only contain today's date and time till seconds and should not contain milliseconds.我正在尝试在 Pyspark 的数据框中创建一个名为load_time_stamp的新列，该列应该只包含今天的日期和时间，直到秒，并且不应该包含毫秒。

I have written the below code for the same but with this, a new column is getting created with null values and not with the timestamp values which I expected.我已经为此编写了下面的代码，但是使用 null 值而不是我预期的时间戳值创建了一个新列。

from pyspark.sql import functions as F

x.withColumn("load_time_stamp", F.to_timestamp(F.substring(F.current_timestamp(), 0, 19), "yyyy-MM-dd'T'HH:mm:ss")).show()

Answer 1

You can use date_format instead:您可以改用date_format ：

import pyspark.sql.functions as F

x.withColumn("load_time_stamp", F.date_format(F.current_timestamp(), "yyyy-MM-dd'T'HH:mm:ss"))

Note that to_timestamp converts a timestamp from the given format, while date_format converts a timestamp to the given format.请注意， to_timestamp将时间戳从给定格式转换，而date_format将时间戳转换为给定格式。 You do not need to substring the current timestamp because date_format will take care of that when formatting to the desired format.您不需要substring当前时间戳，因为date_format在格式化为所需格式时会处理该问题。

Answer 2

If you want to get a timestamp type column with only seconds precision, then you can use from_unixtime function.如果你想得到一个只有秒精度的时间戳类型列，那么你可以使用from_unixtime function。

Example:例子：

from pyspark.sql import functions as F

x = spark.createDataFrame([(1,)], ["id"])

x.withColumn(
    "load_time_stamp",
    F.from_unixtime(F.current_timestamp().cast("long"))
).show(truncate=False)

#+---+-------------------+
#|id |load_time_stamp    |
#+---+-------------------+
#|1  |2021-02-22 15:35:34|
#+---+-------------------+

在 Spark dataframe 中创建没有毫秒部分的时间戳列

问题描述

2 个解决方案

解决方案1
0 2021-02-22 12:20:16

解决方案2
0 2021-02-22 14:40:48

在 Spark dataframe 中创建没有毫秒部分的时间戳列

问题描述

2 个解决方案

解决方案1 0 2021-02-22 12:20:16

解决方案2 0 2021-02-22 14:40:48

解决方案1
0 2021-02-22 12:20:16

解决方案2
0 2021-02-22 14:40:48