I have a timestamp column in my dataframe with timestamps in a format like: 2022-07-28T10:38:50.926866Z that are currently strings.
I want to convert this column into actual timestamps and I've searched around but every time I try to_timestamp with this type of data I get nulls.
Things I've tried:
df = spark.createDataFrame([("2022-07-28T10:38:50.926866Z",)],['date_str'])
df.withColumn("ts1", F.to_timestamp(col('date_str'), "yyyy-MM-dd'T'HH:mm:ss.SSSSSS'Z'")).show(truncate=False)
This always gets me null but when I run something similar on an example with just 3 ms digits, it seems to work:
df = spark.createDataFrame([("2022-07-28T10:38:50.926Z",)],['date_str'])
df.withColumn("ts1", F.to_timestamp(col('date_str'), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")).show(truncate=False)
I'm completely lost on how to handle this string conversion.
I actually ended up solving this by removing the last 4 characters of each timestamp string first and then running the to_timestamp. I don't mind losing the ms so this worked for me.
df = df.withColumn("date_str", F.substring("date_str", 1, 23))
df.withColumn("date_str", F.to_timestamp(df_final.date_str, "yyyy-MM-dd'T'HH:mm:ss.SSS")).show()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.