简体   繁体   English

使用 groupby 选择行,对列求和并使用所有 groupby 元素的总和创建新列

[英]Selecting rows with groupby, summing columns and creating new column with the sum for all groupby elements

I have this dataframe:我有这个 dataframe:

nome       code  tipo   score
Alexandre   AAA  Frads  4000
Alexandre   AAA  Memb   10000
Alexandre   AAA  Memb   20000
Bruno       BBB  Dans   10000
Bruno       BBB  Grap   4000

Values available in this Google Sheets 此 Google 表格中可用的值

I need to create a new column summing the rows with same nome and code where tipo = 'Memb' , in a way that it looks like this:我需要创建一个新列,将具有相同名称和code的行相加,其中 Tipo = ' nome tipo = 'Memb' ,其方式如下所示:

在此处输入图像描述

I tried groupby with transform('sum') however it is getting me the wrong result.我用transform('sum')尝试了groupby ,但是它给了我错误的结果。

df['score'].loc[df['tipo'] == "Memb"]=df[['nome','code','score']].groupby(['nome','code'])['score'].transform('sum')

在此处输入图像描述

What am I missing?我错过了什么?

For improve performance is possible replace score to 0 values by Series.mask and then use GroupBy.transform with sum :为了提高性能,可以通过Series.maskscore替换为0值,然后将GroupBy.transformsum一起使用:

df['Memb_sum']  = (df.assign(score=df['score'].mask(df['tipo'] != 'Memb', 0))
                     .groupby(['nome','code'])['score']
                     .transform('sum'))
print (df)
        nome code   tipo  score  Memb_sum
0  Alexandre  AAA  Frads   4000     30000
1  Alexandre  AAA   Memb  10000     30000
2  Alexandre  AAA   Memb  20000     30000
3      Bruno  BBB   Dans  10000         0
4      Bruno  BBB   Grap   4000         0

Details :详情

print (df.assign(score=df['score'].mask(df['tipo'] != 'Memb', 0)))

        nome code   tipo  score
0  Alexandre  AAA  Frads      0
1  Alexandre  AAA   Memb  10000
2  Alexandre  AAA   Memb  20000
3      Bruno  BBB   Dans      0
4      Bruno  BBB   Grap      0   

You can try this.你可以试试这个。

Set 'tipo' as index using df.set_index , then group values with similar nome and code using df.groupby and use df.transform and sum of those indices which are equal to Memb使用 df.set_index 将'tipo'设置为索引,然后使用df.set_index将具有相似名称和code的值df.groupby ,并使用nome和等于df.transform的那些索引的Memb

df['Memb_sum'] =  (df.set_index('tipo').
                     groupby(['nome','code']).score.
                     transform(lambda x:x.loc[x.index=='Memb'].sum()).
                     values)

Output: Output:

        nome code   tipo  score  Memb_sum
0  Alexandre  AAA  Frads   4000     30000
1  Alexandre  AAA   Memb  10000     30000
2  Alexandre  AAA   Memb  20000     30000
3      Bruno  BBB   Dans  10000         0
4      Bruno  BBB   Grap   4000         0
import numpy as np

df['Memb_sum']=df.groupby(['nome','code','tipo'])['score'].transform('sum')

df['Memb_sum']=np.where(df['tipo'] != 'Memb', 0, df['Memb_sum'])

df['Memb_sum']=df.groupby(['nome','code'])['Memb_sum'].transform('max')

You can perform group by first and later filter out values.您可以先执行分组,然后再过滤掉值。

Output: Output:

        nome code   tipo  score  Memb_sum
0  Alexandre  AAA  Frads   4000     30000
1  Alexandre  AAA   Memb  10000     30000
2  Alexandre  AAA   Memb  20000     30000
3      Bruno  BBB   Dans  10000         0
4      Bruno  BBB   Grap   4000         0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM