[英]Selecting rows with groupby, summing columns and creating new column with the sum for all groupby elements
I have this dataframe:我有这个 dataframe:
nome code tipo score
Alexandre AAA Frads 4000
Alexandre AAA Memb 10000
Alexandre AAA Memb 20000
Bruno BBB Dans 10000
Bruno BBB Grap 4000
Values available in this Google Sheets 此 Google 表格中可用的值
I need to create a new column summing the rows with same nome
and code
where tipo = 'Memb'
, in a way that it looks like this:我需要创建一个新列,将具有相同名称和code
的行相加,其中 Tipo = ' nome
tipo = 'Memb'
,其方式如下所示:
I tried groupby
with transform('sum')
however it is getting me the wrong result.我用transform('sum')
尝试了groupby
,但是它给了我错误的结果。
df['score'].loc[df['tipo'] == "Memb"]=df[['nome','code','score']].groupby(['nome','code'])['score'].transform('sum')
What am I missing?我错过了什么?
For improve performance is possible replace score
to 0
values by Series.mask
and then use GroupBy.transform
with sum
:为了提高性能,可以通过Series.mask
将score
替换为0
值,然后将GroupBy.transform
与sum
一起使用:
df['Memb_sum'] = (df.assign(score=df['score'].mask(df['tipo'] != 'Memb', 0))
.groupby(['nome','code'])['score']
.transform('sum'))
print (df)
nome code tipo score Memb_sum
0 Alexandre AAA Frads 4000 30000
1 Alexandre AAA Memb 10000 30000
2 Alexandre AAA Memb 20000 30000
3 Bruno BBB Dans 10000 0
4 Bruno BBB Grap 4000 0
Details :详情:
print (df.assign(score=df['score'].mask(df['tipo'] != 'Memb', 0)))
nome code tipo score
0 Alexandre AAA Frads 0
1 Alexandre AAA Memb 10000
2 Alexandre AAA Memb 20000
3 Bruno BBB Dans 0
4 Bruno BBB Grap 0
You can try this.你可以试试这个。
Set 'tipo'
as index using df.set_index
, then group values with similar nome
and code
using df.groupby
and use df.transform
and sum of those indices which are equal to Memb
使用 df.set_index 将'tipo'
设置为索引,然后使用df.set_index
将具有相似名称和code
的值df.groupby
,并使用nome
和等于df.transform
的那些索引的Memb
df['Memb_sum'] = (df.set_index('tipo').
groupby(['nome','code']).score.
transform(lambda x:x.loc[x.index=='Memb'].sum()).
values)
Output: Output:
nome code tipo score Memb_sum
0 Alexandre AAA Frads 4000 30000
1 Alexandre AAA Memb 10000 30000
2 Alexandre AAA Memb 20000 30000
3 Bruno BBB Dans 10000 0
4 Bruno BBB Grap 4000 0
import numpy as np
df['Memb_sum']=df.groupby(['nome','code','tipo'])['score'].transform('sum')
df['Memb_sum']=np.where(df['tipo'] != 'Memb', 0, df['Memb_sum'])
df['Memb_sum']=df.groupby(['nome','code'])['Memb_sum'].transform('max')
You can perform group by first and later filter out values.您可以先执行分组,然后再过滤掉值。
Output: Output:
nome code tipo score Memb_sum
0 Alexandre AAA Frads 4000 30000
1 Alexandre AAA Memb 10000 30000
2 Alexandre AAA Memb 20000 30000
3 Bruno BBB Dans 10000 0
4 Bruno BBB Grap 4000 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.