I have a DF with a string column called "data" in the format 02/09/2019 (dd/MM/yyyy). I want to change the data type of the column from STRING to DATE, maintaining the same format. I'm using Spark 2.1.0.
I've tried the statement:
df.select(to_date( unix_timestamp($"data", "dd/MM/yyyy").cast("timestamp")))
It converts the column from STRING to DATE but in yyyy-MM-dd format:
+----------+
| data|
+----------+
|2003-07-22|
|2003-08-01|
+----------+
Using date_format function, I obtain the right format but wrong data type (again STRING):
df.select(date_format(to_date( unix_timestamp($"data", "dd/MM/yyyy").cast("timestamp")), "dd/MM/yyyy") as "data").printSchema()
Thanks a lot.
Date
datatype expects the format as yyyy-MM-dd
.
If we have format as dd/MM/yyyy
and we cannot cast as date
datatype (casting will result null value)
.
Example:
df.show() //sample data
+----------+
| data|
+----------+
|22/07/2003|
|01/08/2003|
+----------+
df.selectExpr("date(data)").show() //casting to date type
+----+
|data|
+----+
|null|
|null|
+----+
How to cast to Datetype?
df.select(to_date(unix_timestamp($"data","dd/MM/yyyy").cast("timestamp")).alias("da")).show()
(or)
df.select(from_unixtime(unix_timestamp($"data","dd/MM/yyyy"),"yyyy-MM-dd").cast("date").alias("da")).show()
+----------+
| da|
+----------+
|2003-07-22|
|2003-08-01|
+----------+
printSchema:
df.select(from_unixtime(unix_timestamp($"data","dd/MM/yyyy"),"yyyy-MM-dd").cast("date").alias("dd")).printSchema
root
|-- dd: date (nullable = true)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.