简体   繁体   中英

Python: how to groupby a pandas dataframe to count by hour and day?

I have a dataframe like the following:

df.head(4)
    timestamp                  user_id   category
0  2017-09-23 15:00:00+00:00     A        Bar
1  2017-09-14 18:00:00+00:00     B        Restaurant
2  2017-09-30 00:00:00+00:00     B        Museum
3  2017-09-11 17:00:00+00:00     C        Museum

I would like to count for each hour for each the number of visitors for each category and have a dataframe like the following

df 
     year month day   hour   category   count
0    2017  9     11    0       Bar       2
1    2017  9     11    1       Bar       1
2    2017  9     11    2       Bar       0
3    2017  9     11    3       Bar       1

Assuming you want to groupby date and hour, you can use the following code if the timestamp column is a datetime column

df.year = df.timestamp.dt.year
df.month = df.timestamp.dt.month
df.day = df.timestamp.dt.day
df.hour = df.timestamp.dt.hour
grouped_data = df.groupby(['year','month','day','hour','category']).count()

For getting the count of user_id per hour per category you can use groupby with your datetime:

df.timestamp = pd.to_datetime(df['timestamp'])
df_new = df.groupby([df.timestamp.dt.year, 
                  df.timestamp.dt.month, 
                  df.timestamp.dt.day, 
                  df.timestamp.dt.hour, 
                  'category']).count()['user_id']
df_new.index.names = ['year', 'month', 'day', 'hour', 'category']
df_new = df_new.reset_index()

When you have a datetime in dataframe, you can use the dt accessor which allows you to access different parts of the datetime, ie year.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM