I have a dataframe with a column date_key
with Datetype. The problem is I want to create another column with only yyyy-mm
part of the date_key
, but still keep it date type. I tried (to_date(df[date_key],'YYYY-MM')
which does not work. Also tried date_format(df[date_key] , 'YYYY-MM')
but the result is string rather than date type. Could someone please help? Many thanks. The result I need to get is in the format of 2020-09
, with no date or timestamp after.
You can use date_trunc
to reduce the precision of a timestamp:
df = spark.createDataFrame([['2020-09-30'], ['2020-11-11']], ['date'])\
.select(to_date(col('date'), 'yyyy-MM-dd').alias('date_key'))
df.show()
+----------+
| date_key|
+----------+
|2020-09-30|
|2020-11-11|
+----------+
Then truncate:
df.select(f.date_trunc('mm', col('date_key'))).show()
+------------------------+
|date_trunc(mm, date_key)|
+------------------------+
| 2020-09-01 00:00:00|
| 2020-11-01 00:00:00|
+------------------------+
date_trunc
will retain the precision up to the specified format, mm
in this case meaning month.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.