![](/img/trans.png)
[英]Python - Pandas: Create new column that is the aggregate sum of another column's group conditional on a date column
[英]Pandas: create new column with group means conditional on another column
我正在尝试创建一个包含组均值的新列,该列以另一列的值为条件。 这最好通过示例来解释:
df = pd.DataFrame({'A': [59000000, 65000000, 434000, 434000, 434000, 337000, 11300, 11300, 11300],
'B': [1, 1 , 0, 1, 0, 0, 1, 1, 0],
'group': ["IT", "IT", "IT", "MV", "MV", "MV", "IT", "MV", "MV"]})
df
A B group
0 59000000 1 IT
1 65000000 1 IT
2 434000 0 IT
3 434000 1 MV
4 434000 0 MV
5 337000 0 MV
6 11300 1 IT
7 11300 1 MV
8 11300 0 MV
我已经设法解决了这个问题,但我正在寻找代码行更少并且可能更高效的东西。
x = df.loc[df['B']==1].groupby('group', as_index=False)['A'].mean()
x.rename(columns = {'A':'a'}, inplace = True)
df = pd.merge(df, x, how='left', on='group')
A B group a
0 59000000 1 IT 41337100
1 65000000 1 IT 41337100
2 434000 0 IT 41337100
3 434000 1 MV 222650
4 434000 0 MV 222650
5 337000 0 MV 222650
6 11300 1 IT 41337100
7 11300 1 MV 222650
8 11300 0 MV 222650
我试过使用转换功能,但它对我不起作用
df.loc[: , 'a'] = df.groupby('group').transform(lambda x: x[x['B']==1]['A'].mean())
使用Series.where
只过滤你需要的 col A
的值,然后groupby
和transform
:
df['a'] = df['A'].where(df['B'].eq(1)).groupby(df['group']).transform('mean')
[出去]
A B group a
0 59000000 1 IT 41337100.0
1 65000000 1 IT 41337100.0
2 434000 0 IT 41337100.0
3 434000 1 MV 222650.0
4 434000 0 MV 222650.0
5 337000 0 MV 222650.0
6 11300 1 IT 41337100.0
7 11300 1 MV 222650.0
8 11300 0 MV 222650.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.