将一个数据帧中的零值列替换为另一个数据帧中的同名列的平均值

Question

I have two data frames, df1 and df2, each with same number of columns & same column names, but with different number of rows. 我有两个数据框，df1和df2，每个数据框具有相同的列数和相同的列名，但具有不同的行数。 Basically, there are many columns in df2 which have all 0 values. 基本上，df2中有许多列都具有0值。

What I would like to accomplish is that all columns in df2 which are zero valued are replaced with the mean (average) value of the same column name (as in df1). 我想要完成的是df2中所有零值的列都被替换为相同列名的平均值（如df1中所示）。

So, if df1 has a structure like:- 所以，如果df1有这样的结构： -

Column1 Column2 ------    Column n
0.4      2.3               1.7
0.7      2.5               1.4
0.1      2.1               1.2

and df2 has a structure like:- 和df2有如下结构： -

Column1 Column2 ------    Column n
0      2.3                1.7
0      2.5               1.4
0      2.1               1.2

I would like to replace column1 (and any other all-zero columns in df2) with the mean of the same column mapped in df1. 我想用df1中映射的相同列的平均值替换column1（以及df2中的任何其他全零列）。 So, finally, df2 would look like:- 所以，最后，df2看起来像： -

Column1 Column2 ------    Column n
0.4      2.3               1.7
0.4      2.5               1.4
0.4      2.1               1.2

(All zero values in column 1 of df2 replaced with mean of column 1 in df1. （df2第1列中的所有零值均替换为df1中第1列的均值。

I am fairly new to this and have checked other options such as fillna() and replace(), but am unable to accomplish exactly what I want. 我是相当新的，并检查了其他选项，如fillna（）和replace（），但我无法完成我想要的。 Any help in this regard is highly appreciated. 在这方面的任何帮助都非常感谢。

Answer 1

Use DataFrame.mask with mean : 使用DataFrame.mask mean ：

df = df2.mask(df2 == 0, df1.mean(), axis=1)
print (df)
   Column1  Column2  Column n
0      0.4      2.3       1.7
1      0.4      2.5       1.4
2      0.4      2.1       1.2

numpy alternative with numpy.where should working faster in large DataFrames: 使用numpy.where numpy替代numpy.where应该在大型DataFrame中更快地工作：

df = pd.DataFrame(np.where(df2 == 0, df1.mean(), df1), 
                  index=df1.index,
                  columns=df1.columns)
print (df)
   Column1  Column2  Column n
0      0.4      2.3       1.7
1      0.4      2.5       1.4
2      0.4      2.1       1.2

将一个数据帧中的零值列替换为另一个数据帧中的同名列的平均值

问题描述

1 个解决方案

解决方案1
3 已采纳 2019-02-28 11:29:35

将一个数据帧中的零值列替换为另一个数据帧中的同名列的平均值

问题描述

1 个解决方案

解决方案1 3 已采纳 2019-02-28 11:29:35

解决方案1
3 已采纳 2019-02-28 11:29:35