简体   繁体   English

如何使用非唯一列将具有求和值的熊猫Groupby数据框映射到另一个数据框

[英]How to map pandas Groupby dataframe with sum values to another dataframe using non-unique column

I have two pandas dataframe df1 and df2 . 我有两个熊猫数据框df1df2 Where i need to find df1['seq'] by doing a groupby on df2 and taking the sum of the column df2['sum_column'] . 我需要通过在df2上进行groupby并取df2['sum_column']列的总和来查找df1['seq'] df2['sum_column'] Below are sample data and my current solution. 以下是示例数据和我当前的解决方案。

df1 df1

id    code amount  seq
234     3    9.8    ?
213     3    18
241     3    6.4
543     3    2
524     2    1.8
142     2    14
987     2    11
658     3    17

df2 df2

c_id  name role    sum_column
1     Aus  leader    6
1     Aus  client    1
1     Aus  chair     7
2     Ned  chair     8
2     Ned  leader    3
3     Mar  client    5
3     Mar  chair     2
3     Mar  leader    4

grouped = df2.groupby('c_id')['sum_column'].sum()
df3 = grouped.reset_index()

df3 df3

c_id  sum_column
 1      14
 2      11
 3      11

The next step where am having issues is to map the df3 to df1 and conduct a conditional check to see if df1['amount'] is greater then df3['sum_column'] . 出现问题的下一步是将df3映射到df1并进行条件检查,以查看df1['amount']是否大于df3['sum_column']

df1['seq'] = np.where(df1['amount'] > df1['code'].map(df3.set_index('c_id')[sum_column]), 1, 0)

printing out df1['code'].map(df3.set_index('c_id')['sum_column']) , I get only NaN values. 打印出df1['code'].map(df3.set_index('c_id')['sum_column']) ,我只会得到NaN值。

Does anyone know what am doing wrong here? 有谁知道这里做错了什么?

Expected results: df1 预期结果: df1

id    code amount  seq
234     3    9.8    0
213     3    18     1
241     3    6.4    0
543     3    2      0
524     2    1.8    0
142     2    14     1
987     2    11     0
658     3    17     1

Solution should be simplify with remove .reset_index() for df3 and pass Series to map : 解决方案应简化为df3 remove .reset_index()并将Series传递给map

s = df2.groupby('c_id')['sum_column'].sum()
df1['seq'] = np.where(df1['amount'] > df1['code'].map(s), 1, 0)

Alternative with casting boolean mask to integer for True, False to 1,0 : 将布尔型掩码强制转换为True, False1,0整数:

df1['seq'] = (df1['amount'] > df1['code'].map(s)).astype(int)

print (df1)
    id  code  amount  seq
0  234     3     9.8    0
1  213     3    18.0    1
2  241     3     6.4    0
3  543     3     2.0    0
4  524     2     1.8    0
5  142     2    14.0    1
6  987     2    11.0    0
7  658     3    17.0    1

您忘记为sum_column添加报价

df1['seq']=np.where(df1['amount'] > df1['code'].map(df3.set_index('c_id')['sum_column']), 1, 0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM