Pandas: How to group by a datetime column, using only the time and discarding the date

Question

I have a dataframe with a datetime column. I want to group by the time component only and aggregate, eg by taking the mean.

I know that I can use pd.Grouper to group by date AND time, but it doesn't work on time only.

Say we have the following dataframe:

import numpy as np
import pandas as pd

drange = pd.date_range('2019-08-01 00:00', '2019-08-12 12:00', freq='1T')
time = drange.time
c0 = np.random.rand(len(drange))
c1 = np.random.rand(len(drange))
df = pd.DataFrame(dict(drange=drange, time=time, c0=c0, c1=c1))
print(df.head())

               drange      time        c0        c1
0 2019-08-01 00:00:00  00:00:00  0.031946  0.159739
1 2019-08-01 00:01:00  00:01:00  0.809171  0.681942
2 2019-08-01 00:02:00  00:02:00  0.036720  0.133443
3 2019-08-01 00:03:00  00:03:00  0.650522  0.409797
4 2019-08-01 00:04:00  00:04:00  0.239262  0.814565

In this case, the following throws a TypeError:

grouper = pd.Grouper(key='time', freq='5T')
grouped = df.groupby(grouper).mean()

I could set key=drange to group by date and time and then:

Reset the index
Transform the new column to float
Bin with pd.cut
Cast back to time
Finally group-by and then aggregate

... But I wonder whether there is a cleaner way to achieve the same results.

Answer 1

Series.dt.time / DatetimeIndex.time returns the time as datetime.time . This isn't great because pandas works best with timedelta64 and so your 'time' column is cast to object , losing all datetime functionality.

You can subtract off the normalized date to obtain the time as a timedelta so you can continue to use the datetime tools of pandas. You can floor this to group.

s = (df.drange - df.drange.dt.normalize()).dt.floor('5T')

df.groupby(s).mean()

                c0        c1
drange                      
00:00:00  0.436971  0.530201
00:05:00  0.441387  0.518831
00:10:00  0.465008  0.478130
...            ...       ...
23:45:00  0.523233  0.515991
23:50:00  0.468695  0.434240
23:55:00  0.569989  0.510291

Alternatively if you feel unsure of floor , this gets the identical output up to the index name

df['time'] = (df.drange - df.drange.dt.normalize())  # timedelta64[ns]
df.groupby(pd.Grouper(key='time', freq='5T')).mean()

Answer 2

When you use DataFrame.groupby you can a Series an argument . Moreover, if your series is a datetime, you can use the series.dt to access the properties of date. In your case df['drange'].dt.hour or df['drange'].dt.time should do it.

# df['drange']=pd.to_datetime(df['drange'])
df.groupby(df['drange'].dt.hour).agg(...)

Answer 3

I'm assuming by 5T that you're trying to group by time and then 5 minute intervals?

Try this:

import pandas as pd
import numpy as np

drange = pd.date_range('2019-08-01 00:00', '2019-08-12 12:00', freq='5T')
time = drange.time
c0 = np.random.rand(len(drange))
c1 = np.random.rand(len(drange))
df = pd.DataFrame(dict(drange=drange, time=time, c0=c0, c1=c1))
df.set_index(df['time'])
df.groupby(df['time']).agg('mean')

                c0        c1
time                        
00:00:00  0.503952  0.437320
00:05:00  0.437571  0.404878
00:10:00  0.524496  0.573247
00:15:00  0.517793  0.534535
00:20:00  0.434469  0.392725
...            ...       ...
23:35:00  0.533461  0.561525
23:40:00  0.633349  0.422529
23:45:00  0.427919  0.486180
23:50:00  0.497414  0.489659
23:55:00  0.561915  0.500814

[288 rows x 2 columns]

Pandas: How to group by a datetime column, using only the time and discarding the date

Question

2 answers

solution1
2 ACCPTED 2019-10-09 15:45:04

solution2
0 2019-10-09 15:47:42

solution3
0 2019-10-09 15:55:04

Pandas: How to group by a datetime column, using only the time and discarding the date

Question

2 answers

solution1 2 ACCPTED 2019-10-09 15:45:04

solution2 0 2019-10-09 15:47:42

solution3 0 2019-10-09 15:55:04

solution1
2 ACCPTED 2019-10-09 15:45:04

solution2
0 2019-10-09 15:47:42

solution3
0 2019-10-09 15:55:04