简体   繁体   English

字符串到日期时间的转换失败

[英]String to datetime conversion is failing

The conversion of the string to datetime is failing.字符串到日期时间的转换失败。 The data in the dataframe has the following format: "2020-08-05T12:34:10.800046" .数据"2020-08-05T12:34:10.800046"的数据格式如下: "2020-08-05T12:34:10.800046"
I used pattern yyyy-MM-ddTHH:mm:ss.SSSSSS我使用了模式yyyy-MM-ddTHH:mm:ss.SSSSSS

config_df.withColumn(
    "modifiedDate",
    F.to_timestamp(config_df["modifiedDate"], "yyyy-MM-dd'T'HH:mm:ss.SSSSSS"),
).show()

+------------+
|modifiedDate|
+------------+
|        null|
+------------+

The execution works without problem but all values in the updated column are NULL .执行没有问题,但更新列中的所有值都是NULL Which format should I use?我应该使用哪种格式?

According to this post , SSS is for milliseconds.根据这篇文章SSS是毫秒。 Therefore, it matches the first 3 digits 800 in your 800046 , no matter how many S you add.因此,无论您添加多少个S ,它都会匹配800046中的前 3 位数字800

I couldn't find any pattern that match your date, so you first need to update your string to keep only 3 digits at the end.我找不到与您的日期匹配的任何模式,因此您首先需要更新您的字符串以仅保留最后的 3 位数字。 With a regex for example以正则表达式为例

a = [
    ("2020-08-05T12:34:10.800123",),
]
b = ["modifiedDate"]
df = spark.createDataFrame(a, b)

df.withColumn(
    "modifiedDate",
    F.to_timestamp(
        F.regexp_extract(
            "modifiedDate", r"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}", 0
        ),
        "yyyy-MM-dd'T'HH:mm:ss.SSS",
    ),
).show()


+-------------------+
|       modifiedDate|
+-------------------+
|2020-08-05 12:34:10|
+-------------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM