简体   繁体   中英

Pandas group by into another dataframe

I have a dataframe in pandas like this:

Level_1 Level_2 Level_3 User_ID User_Flag
A       B       C       123     1
A       B       C       123     0
D       B       C       124     1
E       B       C       125     0
F       B       C       125     1

I need an output dataframe like this:

Level_1 Level_2 Level_3 Size Unique_User_Size Unique_User_Size_Condition
A       B       C       2    1                1
D       B       C       1    1                1
E       B       C       1    1                0
F       B       C       1    1                1

So the group by level is -> Level_1,Level_2,Level_3

Size is number of rows by group, count(*) in sql

Unique_User_Size is number of distinct users in group, count(distinct user_id) in sql

Unique_User_Size_Condition is number of distinct users in group with User_Flag=1, count(distinct case when user_flag=1 then user_id end) in sql

Can someone help me how to get this?

Here's one way to get there. It's kind of a quick/dirty not-very-clean looking solution, but it's one approach. I'm not aware of how you could do the conditional unique aggregation. I added a new field called IDFlag, which is just the User_ID on any rows where User_Flag == 1. Then you do the regular pd.Series.nunique aggregation on that. You could also write a lambda as the aggregation function which contains this logic, but that's a lateral move in terms of readability, IMO.

cols = ['Level_1','Level_2','Level_3','User_ID','User_Flag']
data = [['A','B','C',123,1],
        ['A','B','C',123,0],
        ['D','B','C',124,1],
        ['E','B','C',125,0],
        ['F','B','C',125,1]]
df = pd.DataFrame(data, columns=cols)


agg_dict = {'User_ID':[len,pd.Series.nunique], 
            'IDFlag':pd.Series.nunique}

df.loc[df['User_Flag'] == 1, 'IDFlag'] = df.User_ID
output = df.groupby(['Level_1','Level_2','Level_3']).agg(agg_dict)

output = output.astype(int)

Output:

                         IDFlag User_ID        
                        nunique     len nunique
Level_1 Level_2 Level_3                        
A       B       C             1       2       1
D       B       C             1       1       1
E       B       C             0       1       1
F       B       C             1       1       1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM