[英]String to datetime conversion is failing
The conversion of the string to datetime is failing.字符串到日期时间的转换失败。 The data in the dataframe has the following format: "2020-08-05T12:34:10.800046"
.数据"2020-08-05T12:34:10.800046"
的数据格式如下: "2020-08-05T12:34:10.800046"
。
I used pattern yyyy-MM-ddTHH:mm:ss.SSSSSS
我使用了模式yyyy-MM-ddTHH:mm:ss.SSSSSS
config_df.withColumn(
"modifiedDate",
F.to_timestamp(config_df["modifiedDate"], "yyyy-MM-dd'T'HH:mm:ss.SSSSSS"),
).show()
+------------+
|modifiedDate|
+------------+
| null|
+------------+
The execution works without problem but all values in the updated column are NULL
.执行没有问题,但更新列中的所有值都是NULL
。 Which format should I use?我应该使用哪种格式?
According to this post , SSS
is for milliseconds.根据这篇文章, SSS
是毫秒。 Therefore, it matches the first 3 digits 800
in your 800046
, no matter how many S
you add.因此,无论您添加多少个S
,它都会匹配800046
中的前 3 位数字800
。
I couldn't find any pattern that match your date, so you first need to update your string to keep only 3 digits at the end.我找不到与您的日期匹配的任何模式,因此您首先需要更新您的字符串以仅保留最后的 3 位数字。 With a regex for example以正则表达式为例
a = [
("2020-08-05T12:34:10.800123",),
]
b = ["modifiedDate"]
df = spark.createDataFrame(a, b)
df.withColumn(
"modifiedDate",
F.to_timestamp(
F.regexp_extract(
"modifiedDate", r"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}", 0
),
"yyyy-MM-dd'T'HH:mm:ss.SSS",
),
).show()
+-------------------+
| modifiedDate|
+-------------------+
|2020-08-05 12:34:10|
+-------------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.