简体   繁体   中英

How to convert timestamp with 6digit milliseconds using to_timestamp function in pyspark

I have a timestamp column in my dataframe with timestamps in a format like: 2022-07-28T10:38:50.926866Z that are currently strings.

I want to convert this column into actual timestamps and I've searched around but every time I try to_timestamp with this type of data I get nulls.

Things I've tried:

df = spark.createDataFrame([("2022-07-28T10:38:50.926866Z",)],['date_str'])

df.withColumn("ts1", F.to_timestamp(col('date_str'), "yyyy-MM-dd'T'HH:mm:ss.SSSSSS'Z'")).show(truncate=False)

This always gets me null but when I run something similar on an example with just 3 ms digits, it seems to work:

df = spark.createDataFrame([("2022-07-28T10:38:50.926Z",)],['date_str'])

df.withColumn("ts1", F.to_timestamp(col('date_str'), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")).show(truncate=False)

I'm completely lost on how to handle this string conversion.

I actually ended up solving this by removing the last 4 characters of each timestamp string first and then running the to_timestamp. I don't mind losing the ms so this worked for me.

df = df.withColumn("date_str", F.substring("date_str", 1, 23))
df.withColumn("date_str", F.to_timestamp(df_final.date_str, "yyyy-MM-dd'T'HH:mm:ss.SSS")).show()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM