简体   繁体   中英

how to group by multiple columns

I want to group by my dataframe by different columns based on UserId,Date,category (frequency of use per day ),max duration per category,and the part of the day when it is most used and finally store the result in a.csv file.

 name     duration  UserId  category    part_of_day    Date 
 Settings   3.436    1    System tool      evening   2020-09-10 
 Calendar   2.167    1    Calendar         night     2020-09-11 
 Calendar   5.705    1    Calendar         night     2020-09-11 
 Messages   7.907    1   Phone_and_SMS     night     2020-09-11 
Instagram   50.285   9   Social            night    2020-09-28  
Drive       30.260   9  Productivity       night    2020-09-28   

df.groupby(["UserId", "Date","category"])["category"].count()  

my code result is:

  UserId      Date        category               
    1       2020-09-10    System tool                  1
           2020-09-11     Calendar                     8
                         Clock                         2
                        Communication                  86
                      Health & Fitness                 5     
 

But i want this result

  UserId      Date        category             count(category)  max-duration 
    1       2020-09-10  System tool                  1            3

            2020-09-11  Calendar                     2            5

    2       2020-09-28    Social                     1            50
                          Productivity               1            30

How can I do that? I can not find the wanted result for any solution

From your question, it looks like you'd like to make a table with each combination and the count. For this, you might consider using the as_index parameter in groupby :

df.category.groupby(["UserId", "Date"], as_index=False).count()

It looks like you might be wanting to calculate statistics for each group.

grouped = df.groupby(["UserId", "Date","category"])

result = grouped.agg({'category': 'count', 'duration': 'max'})
result.columns = ['group_count','duration_max']

result = result.reset_index()

result
   UserId        Date       category  group_count  duration_max
0       1  2020-09-10    System tool            1         3.436
1       1  2020-09-11       Calendar            2         5.705
2       1  2020-09-11  Phone_and_SMS            1         7.907
3       9  2020-09-28   Productivity            1        30.260
4       9  2020-09-28         Social            1        50.285

You take advantage of pandas.DataFrame.groupby , pandas.DataFrame.aggregate and pandas.DataFrame.rename in following format to generate your desired output in one line:


code:

import pandas as pd

df = pd.DataFrame({'name': ['Settings','Calendar','Calendar', 'Messages', 'Instagram', 'Drive'],
                   'duration': [3.436, 2.167, 5.7050, 7.907, 50.285, 30.260],
                   'UserId': [1, 1, 1, 1, 2, 2],
                   'category' : ['System_tool', 'Calendar', 'Calendar', 'Phone_and_SMS', 'Social', 'Productivity'],
                   'part_of_day' : ['evening', 'night','night','night','night','night' ],
                   'Date' : ['2020-09-10', '2020-09-11', '2020-09-11', '2020-09-11', '2020-09-28', '2020-09-28'] })

df.groupby(['UserId', 'Date', 'category']).aggregate( count_cat = ('category', 'count'), max_duration = ('duration', 'max'))

out:

从一行输出

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM