[英]How to map pandas Groupby dataframe with sum values to another dataframe using non-unique column
I have two pandas dataframe df1 and df2 . 我有两个熊猫数据框df1和df2 。 Where i need to find
df1['seq']
by doing a groupby on df2
and taking the sum of the column df2['sum_column']
. 我需要通过在
df2
上进行groupby并取df2['sum_column']
列的总和来查找df1['seq']
df2['sum_column']
。 Below are sample data and my current solution. 以下是示例数据和我当前的解决方案。
df1 df1
id code amount seq
234 3 9.8 ?
213 3 18
241 3 6.4
543 3 2
524 2 1.8
142 2 14
987 2 11
658 3 17
df2 df2
c_id name role sum_column
1 Aus leader 6
1 Aus client 1
1 Aus chair 7
2 Ned chair 8
2 Ned leader 3
3 Mar client 5
3 Mar chair 2
3 Mar leader 4
grouped = df2.groupby('c_id')['sum_column'].sum()
df3 = grouped.reset_index()
df3 df3
c_id sum_column
1 14
2 11
3 11
The next step where am having issues is to map the df3 to df1 and conduct a conditional check to see if df1['amount']
is greater then df3['sum_column']
. 出现问题的下一步是将df3映射到df1并进行条件检查,以查看
df1['amount']
是否大于df3['sum_column']
。
df1['seq'] = np.where(df1['amount'] > df1['code'].map(df3.set_index('c_id')[sum_column]), 1, 0)
printing out df1['code'].map(df3.set_index('c_id')['sum_column'])
, I get only NaN
values. 打印出
df1['code'].map(df3.set_index('c_id')['sum_column'])
,我只会得到NaN
值。
Does anyone know what am doing wrong here? 有谁知道这里做错了什么?
Expected results: df1 预期结果: df1
id code amount seq
234 3 9.8 0
213 3 18 1
241 3 6.4 0
543 3 2 0
524 2 1.8 0
142 2 14 1
987 2 11 0
658 3 17 1
Solution should be simplify with remove .reset_index()
for df3
and pass Series
to map
: 解决方案应简化为
df3
remove .reset_index()
并将Series
传递给map
:
s = df2.groupby('c_id')['sum_column'].sum()
df1['seq'] = np.where(df1['amount'] > df1['code'].map(s), 1, 0)
Alternative with casting boolean mask to integer for True, False
to 1,0
: 将布尔型掩码强制转换为
True, False
为1,0
整数:
df1['seq'] = (df1['amount'] > df1['code'].map(s)).astype(int)
print (df1)
id code amount seq
0 234 3 9.8 0
1 213 3 18.0 1
2 241 3 6.4 0
3 543 3 2.0 0
4 524 2 1.8 0
5 142 2 14.0 1
6 987 2 11.0 0
7 658 3 17.0 1
您忘记为sum_column
添加报价
df1['seq']=np.where(df1['amount'] > df1['code'].map(df3.set_index('c_id')['sum_column']), 1, 0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.