简体   繁体   English

在 Spark dataframe 中创建没有毫秒部分的时间戳列

[英]Create a timestamp column in Spark dataframe without milliseconds part

I am trying to create a new column called load_time_stamp in Pyspark's data frame which should only contain today's date and time till seconds and should not contain milliseconds.我正在尝试在 Pyspark 的数据框中创建一个名为load_time_stamp的新列,该列应该只包含今天的日期和时间,直到秒,并且不应该包含毫秒。

I have written the below code for the same but with this, a new column is getting created with null values and not with the timestamp values which I expected.我已经为此编写了下面的代码,但是使用 null 值而不是我预期的时间戳值创建了一个新列。

from pyspark.sql import functions as F

x.withColumn("load_time_stamp", F.to_timestamp(F.substring(F.current_timestamp(), 0, 19), "yyyy-MM-dd'T'HH:mm:ss")).show()

You can use date_format instead:您可以改用date_format

import pyspark.sql.functions as F

x.withColumn("load_time_stamp", F.date_format(F.current_timestamp(), "yyyy-MM-dd'T'HH:mm:ss"))

Note that to_timestamp converts a timestamp from the given format, while date_format converts a timestamp to the given format.请注意, to_timestamp将时间戳给定格式转换,而date_format将时间戳转换为定格式。 You do not need to substring the current timestamp because date_format will take care of that when formatting to the desired format.您不需要substring当前时间戳,因为date_format在格式化为所需格式时会处理该问题。

If you want to get a timestamp type column with only seconds precision, then you can use from_unixtime function.如果你想得到一个只有秒精度的时间戳类型列,那么你可以使用from_unixtime function。

Example:例子:

from pyspark.sql import functions as F

x = spark.createDataFrame([(1,)], ["id"])

x.withColumn(
    "load_time_stamp",
    F.from_unixtime(F.current_timestamp().cast("long"))
).show(truncate=False)

#+---+-------------------+
#|id |load_time_stamp    |
#+---+-------------------+
#|1  |2021-02-22 15:35:34|
#+---+-------------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM