[英]groupby and sum with pandas for certain columns while including other columns also
I have the following data: 我有以下数据:
import pandas as pd
x4 = pd.DataFrame({"ID": [101,101, 102, 103, 104, 105],
"Prob": [1, 1,1, 1, 1, 1],
"Ef": [0,2, 0, 0, 0.25, 0.29],
"W": [2, 2,3, 4, 5, 6],
"EC": [0, 0,0, 0, 1.6, 2],
"Rand": [11, 12,12, 13, 14, 15]})
I would like get the sum(Prob * Ef)
by ID
and then keep only the columns ID
, the column with the sum
, the EC
column and the W
column. 我想
by ID
获取sum(Prob * Ef)
,然后仅保留ID
列,具有sum
的列, EC
列和W
列。
So in the end I want to have this: 所以最后我想要这个:
ID sum_column EC W
1: 101 2.00 0.0 2
2: 101 2.00 0.0 2
3: 102 0.00 0.0 3
4: 103 0.00 0.0 4
5: 104 0.25 1.6 5
6: 105 0.29 2.0 6
I have tried this: x4.loc[:, ['EC','W','ID','Prob','Ef']].groupby('ID').sum(Prob*Ef)
我已经试过了:
x4.loc[:, ['EC','W','ID','Prob','Ef']].groupby('ID').sum(Prob*Ef)
But it does not work 但这不起作用
Use GroupBy.transform
by multiplied columns: 通过多列使用
GroupBy.transform
:
x4['sum_column'] = x4['Prob'].mul(x4['Ef']).groupby(x4['ID']).transform('sum')
x4 = x4.drop(['Ef','Prob', 'Rand'], axis=1)
print (x4)
ID W EC sum_column
0 101 2 0.0 2.00
1 101 2 0.0 2.00
2 102 3 0.0 0.00
3 103 4 0.0 0.00
4 104 5 1.6 0.25
5 105 6 2.0 0.29
If order of columns is important use insert
: 如果列的顺序很重要,请使用
insert
:
x4.insert(1, 'sum_column', x4['Prob'].mul(x4['Ef']).groupby(x4['ID']).transform('sum'))
x4 = x4.drop(['Ef','Prob', 'Rand'], axis=1)
print (x4)
ID sum_column W EC
0 101 2.00 2 0.0
1 101 2.00 2 0.0
2 102 0.00 3 0.0
3 103 0.00 4 0.0
4 104 0.25 5 1.6
5 105 0.29 6 2.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.