简体   繁体   中英

Convert string column to date in pyspark

I've a dataframe where the date/time column is of string datatype and looks something like "Tue Apr 21 01:16:19 2020" . How do I convert this to a date column with format as 2020/04/21 in pyspark. I tried something like this,

option1:

df = df.withColumn("event_time2",from_unixtime(unix_timestamp(col("Event_time"), 'MM/dd/yyy')))

option2:

df= df.withColumn("event_time2",unix_timestamp(col("Event_time"),'yyyy-MM-dd HH:mm:ss').cast("timestamp"))

but both return null

You could use to_date and date_format . EEE is for day in the week . Refer to Java Simple Data Format for the complete list

from pyspark.sql import functions as F
df.withColumn("Event_time2", F.to_date("Event_time", 'EEE MMM dd HH:mm:ss yyyy')).show(truncate=False)

#+------------------------+-----------+
#|Event_time              |Event_time2|
#+------------------------+-----------+
#|Tue Apr 21 01:16:19 2020|2020-04-21 |
#+------------------------+-----------+


df.withColumn("Event_time2", F.date_format(F.to_date("Event_time", 'EEE MMM dd HH:mm:ss yyyy'),'yyyy/MM/dd')).show(truncate=False)

#+------------------------+-----------+
#|Event_time              |Event_time2|
#+------------------------+-----------+
#|Tue Apr 21 01:16:19 2020|2020/04/21 |
#+------------------------+-----------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM