在数据框中创建一个均值列，该列依赖于 Pandas 中数据框的其他变量

Question

I have a code that is roughly like this:我有一个大致是这样的代码：

import numpy as np
import pandas as pd

df = pd.DataFrame({'Group':['a','a','b','b','b','c','c'], 'Label':[0,1,0,1,1,0,1], 'Num':[1,2,3,4,5,6,7]})

I would like to have aa new column that is the mean of Num, but only those with class label 1. However, this mean should be applied to all rows with label 1 only, with the rest being 0/NaN.我想有一个新列，它是 Num 的平均值，但只有那些类标签为 1 的列。但是，这个平均值应该只应用于标签为 1 的所有行，其余为 0/NaN。 The output should be like this: mean = [0,2,0,4.5,4.5,0,7]输出应该是这样的：mean = [0,2,0,4.5,4.5,0,7]

Also how would it be if instead of 0/Nan, you just apply that mean to all values of that group?另外，如果不是 0/Nan，而是将该均值应用于该组的所有值，会怎么样？ ie mean = [2,2,4.5,4.5,4.5,7,7]即平均值 = [2,2,4.5,4.5,4.5,7,7]

Thanks a lot非常感谢

Answer 1

NaNs/0 NaN/0

If NaNs are OK, just slice before applying a groupby + mean :如果 NaN 没问题，只需在应用groupby + mean之前切片：

df['mean'] = df[df['Label'].eq(1)].groupby('Group')['Num'].transform('mean')

output:输出：

  Group  Label  Num  mean
0     a      0    1   NaN
1     a      1    2   2.0
2     b      0    3   NaN
3     b      1    4   4.5
4     b      1    5   4.5
5     c      0    6   NaN
6     c      1    7   7.0

If you prefer 0, you can fillna(0)如果你喜欢 0，你可以fillna(0)

output on all rows所有行的输出

To get the output on all rows.获取所有行的输出。

mask the values in Num when Label is 1 to change them into NaNs, groupby the Group values and transform all rows with the mean of the group. mask在民的值时Label是1至它们转换为NaN的， groupby组值和transform与所有行mean的组的。

df['mean'] = (df['Num'].mask(df['Label'].ne(1))
                       .groupby(df['Group']).transform('mean'))

output:输出：

  Group  Label  Num  mean
0     a      0    1   2.0
1     a      1    2   2.0
2     b      0    3   4.5
3     b      1    4   4.5
4     b      1    5   4.5
5     c      0    6   7.0
6     c      1    7   7.0

在数据框中创建一个均值列，该列依赖于 Pandas 中数据框的其他变量

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-10-17 15:17:25

NaNs/0 NaN/0

output on all rows所有行的输出

在数据框中创建一个均值列，该列依赖于 Pandas 中数据框的其他变量

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-10-17 15:17:25

NaNs/0 NaN/0

output on all rows所有行的输出

解决方案1
2 已采纳 2021-10-17 15:17:25