使用 groupby 选择行，对列求和并使用所有 groupby 元素的总和创建新列

Question

I have this dataframe:我有这个 dataframe：

nome       code  tipo   score
Alexandre   AAA  Frads  4000
Alexandre   AAA  Memb   10000
Alexandre   AAA  Memb   20000
Bruno       BBB  Dans   10000
Bruno       BBB  Grap   4000

Values available in this Google Sheets 此 Google 表格中可用的值

I need to create a new column summing the rows with same nome and code where tipo = 'Memb' , in a way that it looks like this:我需要创建一个新列，将具有相同名称和code的行相加，其中 Tipo = ' nome tipo = 'Memb' ，其方式如下所示：

I tried groupby with transform('sum') however it is getting me the wrong result.我用transform('sum')尝试了groupby ，但是它给了我错误的结果。

df['score'].loc[df['tipo'] == "Memb"]=df[['nome','code','score']].groupby(['nome','code'])['score'].transform('sum')

What am I missing?我错过了什么？

Answer 1

For improve performance is possible replace score to 0 values by Series.mask and then use GroupBy.transform with sum :为了提高性能，可以通过Series.mask将score替换为0值，然后将GroupBy.transform与sum一起使用：

df['Memb_sum']  = (df.assign(score=df['score'].mask(df['tipo'] != 'Memb', 0))
                     .groupby(['nome','code'])['score']
                     .transform('sum'))
print (df)
        nome code   tipo  score  Memb_sum
0  Alexandre  AAA  Frads   4000     30000
1  Alexandre  AAA   Memb  10000     30000
2  Alexandre  AAA   Memb  20000     30000
3      Bruno  BBB   Dans  10000         0
4      Bruno  BBB   Grap   4000         0

Details :详情：

print (df.assign(score=df['score'].mask(df['tipo'] != 'Memb', 0)))

        nome code   tipo  score
0  Alexandre  AAA  Frads      0
1  Alexandre  AAA   Memb  10000
2  Alexandre  AAA   Memb  20000
3      Bruno  BBB   Dans      0
4      Bruno  BBB   Grap      0

Answer 2

You can try this.你可以试试这个。

Set 'tipo' as index using df.set_index , then group values with similar nome and code using df.groupby and use df.transform and sum of those indices which are equal to Memb使用 df.set_index 将'tipo'设置为索引，然后使用df.set_index将具有相似名称和code的值df.groupby ，并使用nome和等于df.transform的那些索引的Memb

df['Memb_sum'] =  (df.set_index('tipo').
                     groupby(['nome','code']).score.
                     transform(lambda x:x.loc[x.index=='Memb'].sum()).
                     values)

Output: Output：

        nome code   tipo  score  Memb_sum
0  Alexandre  AAA  Frads   4000     30000
1  Alexandre  AAA   Memb  10000     30000
2  Alexandre  AAA   Memb  20000     30000
3      Bruno  BBB   Dans  10000         0
4      Bruno  BBB   Grap   4000         0

Answer 3

import numpy as np

df['Memb_sum']=df.groupby(['nome','code','tipo'])['score'].transform('sum')

df['Memb_sum']=np.where(df['tipo'] != 'Memb', 0, df['Memb_sum'])

df['Memb_sum']=df.groupby(['nome','code'])['Memb_sum'].transform('max')

You can perform group by first and later filter out values.您可以先执行分组，然后再过滤掉值。

Output: Output：

        nome code   tipo  score  Memb_sum
0  Alexandre  AAA  Frads   4000     30000
1  Alexandre  AAA   Memb  10000     30000
2  Alexandre  AAA   Memb  20000     30000
3      Bruno  BBB   Dans  10000         0
4      Bruno  BBB   Grap   4000         0

使用 groupby 选择行，对列求和并使用所有 groupby 元素的总和创建新列

问题描述

3 个解决方案

解决方案1
3 已采纳 2020-05-31 12:32:20

解决方案2
2 2020-05-31 12:25:16

解决方案3
1 2020-05-31 12:05:37

使用 groupby 选择行，对列求和并使用所有 groupby 元素的总和创建新列

问题描述

3 个解决方案

解决方案1 3 已采纳 2020-05-31 12:32:20

解决方案2 2 2020-05-31 12:25:16

解决方案3 1 2020-05-31 12:05:37

解决方案1
3 已采纳 2020-05-31 12:32:20

解决方案2
2 2020-05-31 12:25:16

解决方案3
1 2020-05-31 12:05:37