简体   繁体   中英

SQL Spark - Group timestamp records by date, month and year

I have a dataframe that looks like this:

2019-04-17T17:21:00.963+0000    300
2019-04-17T17:21:21.000+0000    194
2019-04-17T17:21:30.096+0000    104
2019-04-17T17:22:00.243+0000    299
2019-04-17T17:22:20.290+0000    222
2019-04-17T17:22:30.376+0000    76
2019-04-17T17:22:50.570+0000    298
2019-04-17T17:23:20.760+0000    298

I would like to group these timestamps by the day, month and year and create an abstraction for the hour/minute.

query="""
SELECT day(InsertDate) as day,
month(InsertDate) as month,
year(InsertDate) as year,
count(ItemLogID) as value
FROM db_ods_aesbhist.ItemLogMessageInbox
group by day, month, year
ORDER BY value DESC
"""

df_input=spark.sql(query).toPandas().set_index()
display(df_input)

I came up with this but it generates three columns and I would like to keep using the date as key.

Any idea how to do this?

Just found out that to_date() does the trick.

Marking as Solved!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM