简体   繁体   中英

How to group by multiple columns in python

I want to group by my dataframe by different columns based on UserId,Date,category (frequency of use per day ),max duration per category,and the part of the day when it is most used and finally store the result in a.csv file.

name     duration  UserId  category    part_of_day    Date 
Settings   3.436    1    System tool      evening   2020-09-10 
Calendar   2.167    1    Calendar         night     2020-09-11 
Calendar   5.705    1    Calendar         night     2020-09-11 
Messages   7.907    1   Phone_and_SMS     night     2020-09-11 
Instagram   50.285   9   Social            night    2020-09-28  
Drive       30.260   9  Productivity       night    2020-09-28   

df.groupby(["UserId", "Date","category"])["category"].count()

my code result is:

 UserId      Date        category               
1       2020-09-10    System tool                  1
       2020-09-11     Calendar                     8
                     Clock                         2
                    Communication                  86
                  Health & Fitness                 5     

But i want this result

 UserId      Date        category             count(category)  max-duration 
1       2020-09-10  System tool                  1            3

        2020-09-11  Calendar                     2            5

2       2020-09-28    Social                     1            50
                      Productivity               1            30

How can I do that? I can not find the wanted result for any solution

Use agg :

df.groupby(["UserId", "Date","category"]).agg({'category':'count',
                                               'Date': np.ptp})

or replace np.ptp with lambda x: x.max() - x.min() .

Data

df = pd.DataFrame({'name    ': {0: 'Settings', 1: 'Calendar', 2: 'Calendar', 3: 'Messages', 4: 'Instagram', 5: 'Drive'}, '  duration': {0: 3.4360000000000004, 1: 2.167, 2: 5.705, 3: 7.907, 4: 50.285, 5: 30.26}, ' UserId': {0: 1, 1: 1, 2: 1, 3: 1, 4: 9, 5: 9}, '  category': {0: '       System tool', 1: '       Calendar', 2: '       Calendar', 3: '       Phone_and_SMS', 4: '       Social', 5: '       Productivity'}, '     part_of_day': {0: '  evening', 1: '     night  ', 2: '     night  ', 3: 'night  ', 4: '       night  ', 5: ' night  '}, ' Date': {0: '     2020-09-10', 1: '     2020-09-11', 2: '     2020-09-11', 3: '     2020-09-11', 4: '     2020-09-28', 5: '     2020-09-28'}})
df.columns = df.columns.str.strip()

df:

        name  duration  UserId              category     part_of_day             Date
0   Settings     3.436       1           System tool         evening       2020-09-10
1   Calendar     2.167       1              Calendar         night         2020-09-11
2   Calendar     5.705       1              Calendar         night         2020-09-11
3   Messages     7.907       1         Phone_and_SMS         night         2020-09-11
4  Instagram    50.285       9                Social         night         2020-09-28
5      Drive    30.260       9          Productivity         night         2020-09-28
grouping = df.groupby(["UserId", "Date","category"]).agg({"category": 'count', 'duration':max}).rename(columns={"duration" : "max-duration"})

grouping:

                                             category  max-duration
UserId Date            category                                    
1           2020-09-10        System tool           1         3.436
            2020-09-11        Calendar              2         5.705
                              Phone_and_SMS         1         7.907
9           2020-09-28        Productivity          1        30.260
                              Social                1        50.285

You take advantage of pandas.DataFrame.groupby , pandas.DataFrame.aggregate and pandas.DataFrame.rename in following format to generate your desired output in one line:


code:

import pandas as pd

df = pd.DataFrame({'name': ['Settings','Calendar','Calendar', 'Messages', 'Instagram', 'Drive'],
                   'duration': [3.436, 2.167, 5.7050, 7.907, 50.285, 30.260],
                   'UserId': [1, 1, 1, 1, 2, 2],
                   'category' : ['System_tool', 'Calendar', 'Calendar', 'Phone_and_SMS', 'Social', 'Productivity'],
                   'part_of_day' : ['evening', 'night','night','night','night','night' ],
                   'Date' : ['2020-09-10', '2020-09-11', '2020-09-11', '2020-09-11', '2020-09-28', '2020-09-28'] })

df.groupby(['UserId', 'Date', 'category']).aggregate( count_cat = ('category', 'count'), max_duration = ('duration', 'max'))

out:

从一行输出

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM