简体   繁体   中英

changing date format in pyspark

I am trying to change date format from 20211018 to 202110180000 . of a column of a spark data frame.

I have the following code to create new dt column out of existing data_dt

func =  udf (lambda x: datetime.datetime.strptime(x, '%Y%m%d'), DateType())
    
result_df = result_df.withColumn('dt', func(col('data_dt')))
result_df = result_df.select('data_dt', 
                   from_unixtime(unix_timestamp('data_dt', '%Y%m%d0000')).alias('dt'))

which throws a error:

'ValueError: time data '20211018' does not match format '%Y%m%d0000''

I tried the correct format of date "%Y%m%d%H%M" . It throws other error. Please let me know how to fix this. I want defaulted 0000 at the end.

No need for UDF. Simply cast the string into date type using to_date then apply date_format function:

from pyspark.sql import functions as F

df = spark.createDataFrame([("20211018",)], ["data_dt"])

result_df = df.withColumn(
    "dt",
    F.date_format(F.to_date("data_dt", "yyyyMMdd"), "yyyyMMddHHmm")
)

result_df.show()
#+--------+------------+
#| data_dt|          dt|
#+--------+------------+
#|20211018|202110180000|
#+--------+------------+

See Spark docs for Datetime Patterns for Formatting and Parsing

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM