[英]Creating a mean column in a dataframe dependent on other variables of the dataframe in pandas
I have a code that is roughly like this:我有一个大致是这样的代码:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Group':['a','a','b','b','b','c','c'], 'Label':[0,1,0,1,1,0,1], 'Num':[1,2,3,4,5,6,7]})
I would like to have aa new column that is the mean of Num, but only those with class label 1. However, this mean should be applied to all rows with label 1 only, with the rest being 0/NaN.我想有一个新列,它是 Num 的平均值,但只有那些类标签为 1 的列。但是,这个平均值应该只应用于标签为 1 的所有行,其余为 0/NaN。 The output should be like this: mean = [0,2,0,4.5,4.5,0,7]输出应该是这样的:mean = [0,2,0,4.5,4.5,0,7]
Also how would it be if instead of 0/Nan, you just apply that mean to all values of that group?另外,如果不是 0/Nan,而是将该均值应用于该组的所有值,会怎么样? ie mean = [2,2,4.5,4.5,4.5,7,7]即平均值 = [2,2,4.5,4.5,4.5,7,7]
Thanks a lot非常感谢
If NaNs are OK, just slice before applying a groupby
+ mean
:如果 NaN 没问题,只需在应用groupby
+ mean
之前切片:
df['mean'] = df[df['Label'].eq(1)].groupby('Group')['Num'].transform('mean')
output:输出:
Group Label Num mean
0 a 0 1 NaN
1 a 1 2 2.0
2 b 0 3 NaN
3 b 1 4 4.5
4 b 1 5 4.5
5 c 0 6 NaN
6 c 1 7 7.0
If you prefer 0, you can fillna(0)
如果你喜欢 0,你可以fillna(0)
To get the output on all rows.获取所有行的输出。
mask
the values in Num when Label is 1 to change them into NaNs, groupby
the Group values and transform
all rows with the mean
of the group. mask
在民的值时Label是1至它们转换为NaN的, groupby
组值和transform
与所有行mean
的组的。
df['mean'] = (df['Num'].mask(df['Label'].ne(1))
.groupby(df['Group']).transform('mean'))
output:输出:
Group Label Num mean
0 a 0 1 2.0
1 a 1 2 2.0
2 b 0 3 4.5
3 b 1 4 4.5
4 b 1 5 4.5
5 c 0 6 7.0
6 c 1 7 7.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.