简体   繁体   中英

Change data type column from string to date with custom format

I have a DF with a string column called "data" in the format 02/09/2019 (dd/MM/yyyy). I want to change the data type of the column from STRING to DATE, maintaining the same format. I'm using Spark 2.1.0.

I've tried the statement:

df.select(to_date( unix_timestamp($"data", "dd/MM/yyyy").cast("timestamp")))

It converts the column from STRING to DATE but in yyyy-MM-dd format:

+----------+
|      data|
+----------+
|2003-07-22|
|2003-08-01|
+----------+

Using date_format function, I obtain the right format but wrong data type (again STRING):

df.select(date_format(to_date( unix_timestamp($"data", "dd/MM/yyyy").cast("timestamp")), "dd/MM/yyyy") as "data").printSchema()

Thanks a lot.

Date datatype expects the format as yyyy-MM-dd .

If we have format as dd/MM/yyyy and we cannot cast as date datatype (casting will result null value) .

Example:

df.show() //sample data

+----------+
|      data|
+----------+
|22/07/2003|
|01/08/2003|
+----------+

df.selectExpr("date(data)").show() //casting to date type

+----+
|data|
+----+
|null|
|null|
+----+

How to cast to Datetype?

df.select(to_date(unix_timestamp($"data","dd/MM/yyyy").cast("timestamp")).alias("da")).show()

(or)

df.select(from_unixtime(unix_timestamp($"data","dd/MM/yyyy"),"yyyy-MM-dd").cast("date").alias("da")).show()

+----------+
|        da|
+----------+
|2003-07-22|
|2003-08-01|
+----------+

printSchema:

df.select(from_unixtime(unix_timestamp($"data","dd/MM/yyyy"),"yyyy-MM-dd").cast("date").alias("dd")).printSchema
root
 |-- dd: date (nullable = true)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM