I want to group by my dataframe by different columns based on UserId,Date,category (frequency of use per day ),max duration per category,and the part of the day when it is most used and finally store the result in a.csv file.
name duration UserId category part_of_day Date
Settings 3.436 1 System tool evening 2020-09-10
Calendar 2.167 1 Calendar night 2020-09-11
Calendar 5.705 1 Calendar night 2020-09-11
Messages 7.907 1 Phone_and_SMS night 2020-09-11
Instagram 50.285 9 Social night 2020-09-28
Drive 30.260 9 Productivity night 2020-09-28
df.groupby(["UserId", "Date","category"])["category"].count()
my code result is:
UserId Date category
1 2020-09-10 System tool 1
2020-09-11 Calendar 8
Clock 2
Communication 86
Health & Fitness 5
But i want this result
UserId Date category count(category) max-duration
1 2020-09-10 System tool 1 3
2020-09-11 Calendar 2 5
2 2020-09-28 Social 1 50
Productivity 1 30
How can I do that? I can not find the wanted result for any solution
From your question, it looks like you'd like to make a table with each combination and the count. For this, you might consider using the as_index
parameter in groupby
:
df.category.groupby(["UserId", "Date"], as_index=False).count()
It looks like you might be wanting to calculate statistics for each group.
grouped = df.groupby(["UserId", "Date","category"])
result = grouped.agg({'category': 'count', 'duration': 'max'})
result.columns = ['group_count','duration_max']
result = result.reset_index()
result
UserId Date category group_count duration_max
0 1 2020-09-10 System tool 1 3.436
1 1 2020-09-11 Calendar 2 5.705
2 1 2020-09-11 Phone_and_SMS 1 7.907
3 9 2020-09-28 Productivity 1 30.260
4 9 2020-09-28 Social 1 50.285
You take advantage of pandas.DataFrame.groupby , pandas.DataFrame.aggregate and pandas.DataFrame.rename in following format to generate your desired output in one line:
import pandas as pd
df = pd.DataFrame({'name': ['Settings','Calendar','Calendar', 'Messages', 'Instagram', 'Drive'],
'duration': [3.436, 2.167, 5.7050, 7.907, 50.285, 30.260],
'UserId': [1, 1, 1, 1, 2, 2],
'category' : ['System_tool', 'Calendar', 'Calendar', 'Phone_and_SMS', 'Social', 'Productivity'],
'part_of_day' : ['evening', 'night','night','night','night','night' ],
'Date' : ['2020-09-10', '2020-09-11', '2020-09-11', '2020-09-11', '2020-09-28', '2020-09-28'] })
df.groupby(['UserId', 'Date', 'category']).aggregate( count_cat = ('category', 'count'), max_duration = ('duration', 'max'))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.