[英]fill NaN values with mean based on another column specific value
I want to fill the NaN values on my dataframe on column c
with the mean for only rows who has as category B
, and ignore the others.我想在 c 列上的
c
上填充 NaN 值,仅使用具有B
类的行的平均值,而忽略其他行。
print (df)
Category b c
0 A 1 5.0
1 C 1 NaN
2 A 1 4.0
3 B 2 NaN
4 A 2 1.0
5 B 2 Nan
6 C 1 3.0
7 C 1 2.0
8 B 1 NaN
So what I'm doing for the moment is:所以我目前正在做的是:
df.c = df.c.fillna(df.c.mean())
But it fill all the NaN values, while I want only to fill the 3rd, 5th and the 8th rows who had category value equal to B
.但它填充了所有 NaN 值,而我只想填充类别值等于
B
的第 3、第 5 和第 8 行。
Combine fillna
with slicing assignment将
fillna
与切片分配相结合
df.loc[df.Category.eq('B'), 'c'] = (df.loc[df.Category.eq('B'), 'c'].
fillna(df.c.mean()))
Out[736]:
Category b c
0 A 1 5.0
1 C 1 NaN
2 A 1 4.0
3 B 2 3.0
4 A 2 1.0
5 B 2 3.0
6 C 1 3.0
7 C 1 2.0
8 B 1 3.0
Or a direct assignment with 2 masks或带有 2 个掩码的直接分配
pandas.DataFrame.eq
is the element wise equality operator. pandas.DataFrame.eq
是元素明智的相等运算符。df.loc[df.Category.eq('B') & df.c.isna(), 'c'] = df.c.mean()
Out[745]:
Category b c
0 A 1 5.0
1 C 1 NaN
2 A 1 4.0
3 B 2 3.0
4 A 2 1.0
5 B 2 3.0
6 C 1 3.0
7 C 1 2.0
8 B 1 3.0
This would be the answer for your question:这将是您问题的答案:
df.c = df.apply(
lambda row: row['c'].fillna(df.c.mean()) if row['Category']=='B' else row['c'] ,axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.