[英]Pandas group by into another dataframe
I have a dataframe in pandas like this:我在熊猫中有一个像这样的数据框:
Level_1 Level_2 Level_3 User_ID User_Flag
A B C 123 1
A B C 123 0
D B C 124 1
E B C 125 0
F B C 125 1
I need an output dataframe like this:我需要一个像这样的输出数据帧:
Level_1 Level_2 Level_3 Size Unique_User_Size Unique_User_Size_Condition
A B C 2 1 1
D B C 1 1 1
E B C 1 1 0
F B C 1 1 1
So the group by level is -> Level_1,Level_2,Level_3所以按级别分组是 -> Level_1,Level_2,Level_3
Size is number of rows by group, count(*) in sql大小是按组计算的行数,sql 中的 count(*)
Unique_User_Size is number of distinct users in group, count(distinct user_id) in sql Unique_User_Size 是组中不同用户的数量,sql 中的 count(distinct user_id)
Unique_User_Size_Condition is number of distinct users in group with User_Flag=1, count(distinct case when user_flag=1 then user_id end) in sql Unique_User_Size_Condition 是 sql 中 User_Flag=1 的组中不同用户的数量,计数(user_flag=1 时的不同情况,然后 user_id 结束)
Can someone help me how to get this?有人可以帮我如何得到这个吗?
Here's one way to get there.这是到达那里的一种方法。 It's kind of a quick/dirty not-very-clean looking solution, but it's one approach.
这是一种快速/肮脏的不太干净的解决方案,但它是一种方法。 I'm not aware of how you could do the conditional unique aggregation.
我不知道您如何进行条件唯一聚合。 I added a new field called IDFlag, which is just the User_ID on any rows where User_Flag == 1. Then you do the regular pd.Series.nunique aggregation on that.
我添加了一个名为 IDFlag 的新字段,它只是 User_Flag == 1 的任何行上的 User_ID。然后您对其进行常规 pd.Series.nunique 聚合。 You could also write a lambda as the aggregation function which contains this logic, but that's a lateral move in terms of readability, IMO.
您也可以编写一个 lambda 作为包含此逻辑的聚合函数,但就可读性而言,这是 IMO 的横向移动。
cols = ['Level_1','Level_2','Level_3','User_ID','User_Flag']
data = [['A','B','C',123,1],
['A','B','C',123,0],
['D','B','C',124,1],
['E','B','C',125,0],
['F','B','C',125,1]]
df = pd.DataFrame(data, columns=cols)
agg_dict = {'User_ID':[len,pd.Series.nunique],
'IDFlag':pd.Series.nunique}
df.loc[df['User_Flag'] == 1, 'IDFlag'] = df.User_ID
output = df.groupby(['Level_1','Level_2','Level_3']).agg(agg_dict)
output = output.astype(int)
Output:输出:
IDFlag User_ID
nunique len nunique
Level_1 Level_2 Level_3
A B C 1 2 1
D B C 1 1 1
E B C 0 1 1
F B C 1 1 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.